LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v2] hungtask: add filter kthread
@ 2021-08-05 13:47 chenguanyou
2021-08-05 14:11 ` Michal Hocko
0 siblings, 1 reply; 9+ messages in thread
From: chenguanyou @ 2021-08-05 13:47 UTC (permalink / raw)
To: linux-kernel
Cc: akpm, keescook, mhocko, lukas.bulwahn, vbabka, gpiccoli, chenguanyou
Some kernel threads are always in D state, when we enable hung_task,
it will misjudge, we should skip these to narrow the scope.
exp mediatek:
root 435 435 2 0 0 mtk_lpm_monitor_thread 0 D LPM-0
root 436 436 2 0 0 mtk_lpm_monitor_thread 0 D LPM-1
root 437 437 2 0 0 mtk_lpm_monitor_thread 0 D LPM-2
root 438 438 2 0 0 mtk_lpm_monitor_thread 0 D LPM-3
root 439 439 2 0 0 mtk_lpm_monitor_thread 0 D LPM-4
root 440 440 2 0 0 mtk_lpm_monitor_thread 0 D LPM-5
root 441 441 2 0 0 mtk_lpm_monitor_thread 0 D LPM-6
root 442 442 2 0 0 mtk_lpm_monitor_thread 0 D LPM-7
Signed-off-by: chenguanyou <chenguanyou@xiaomi.com>
---
Documentation/admin-guide/sysctl/kernel.rst | 10 ++++++++++
include/linux/sched/sysctl.h | 1 +
kernel/hung_task.c | 8 ++++++++
kernel/sysctl.c | 9 +++++++++
lib/Kconfig.debug | 15 +++++++++++++++
5 files changed, 43 insertions(+)
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 68b21395a743..3c7c74b26d95 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -405,6 +405,16 @@ This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
-1: report an infinite number of warnings.
+hung_task_filter_kthread
+========================
+
+We should skip kthread when a hung task is detected.
+This file shows up if ``CONFIG_DEFAULT_HUNG_TASK_FILTER_KTHREAD`` is enabled.
+
+= =========================================================
+0 Not skip detect kthread.
+1 Skip detect kthread.
+= =========================================================
hyperv_record_panic_msg
=======================
diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index db2c0f34aaaf..2b8b01b57559 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -19,6 +19,7 @@ extern unsigned int sysctl_hung_task_panic;
extern unsigned long sysctl_hung_task_timeout_secs;
extern unsigned long sysctl_hung_task_check_interval_secs;
extern int sysctl_hung_task_warnings;
+extern unsigned int sysctl_hung_task_filter_kthread;
int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos);
#else
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 396ebaebea3f..74ad75c2dde8 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -48,6 +48,11 @@ unsigned long __read_mostly sysctl_hung_task_timeout_secs = CONFIG_DEFAULT_HUNG_
*/
unsigned long __read_mostly sysctl_hung_task_check_interval_secs;
+/*
+ * Non-zero means no checking kthread
+ */
+unsigned int __read_mostly sysctl_hung_task_filter_kthread = CONFIG_DEFAULT_HUNG_TASK_FILTER_KTHREAD;
+
int __read_mostly sysctl_hung_task_warnings = 10;
static int __read_mostly did_panic;
@@ -88,6 +93,9 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
{
unsigned long switch_count = t->nvcsw + t->nivcsw;
+ if (unlikely(sysctl_hung_task_filter_kthread && t->flags & PF_KTHREAD))
+ return;
+
/*
* Ensure the task is not frozen.
* Also, skip vfork and any other user process that freezer should skip.
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index d4a78e08f6d8..62067b9db486 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2513,6 +2513,15 @@ static struct ctl_table kern_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = &neg_one,
},
+ {
+ .procname = "hung_task_filter_kthread",
+ .data = &sysctl_hung_task_filter_kthread,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
#endif
#ifdef CONFIG_RT_MUTEXES
{
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 678c13967580..d7063f955987 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1110,6 +1110,21 @@ config DEFAULT_HUNG_TASK_TIMEOUT
A timeout of 0 disables the check. The default is two minutes.
Keeping the default should be fine in most cases.
+config DEFAULT_HUNG_TASK_FILTER_KTHREAD
+ int "Default filter kthread for hung task"
+ depends on DETECT_HUNG_TASK
+ range 0 1
+ default 0
+ help
+ This option controls filter kthread uses to determine when
+ a kernel task has become "state=TASK_UNINTERRUPTIBLE" and should be skipped.
+
+ It can be adjusted at runtime via the kernel.hung_task_filter_kthread
+ sysctl or by writing a value to
+ /proc/sys/kernel/hung_task_filter_kthread.
+
+ A filter of 1 disables the check.
+
config BOOTPARAM_HUNG_TASK_PANIC
bool "Panic (Reboot) On Hung Tasks"
depends on DETECT_HUNG_TASK
--
2.17.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] hungtask: add filter kthread
2021-08-05 13:47 [PATCH v2] hungtask: add filter kthread chenguanyou
@ 2021-08-05 14:11 ` Michal Hocko
2021-08-05 14:47 ` chenguanyou
0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2021-08-05 14:11 UTC (permalink / raw)
To: chenguanyou
Cc: linux-kernel, akpm, keescook, lukas.bulwahn, vbabka, gpiccoli,
chenguanyou
On Thu 05-08-21 21:47:47, chenguanyou wrote:
> Some kernel threads are always in D state, when we enable hung_task,
> it will misjudge, we should skip these to narrow the scope.
>
> exp mediatek:
> root 435 435 2 0 0 mtk_lpm_monitor_thread 0 D LPM-0
> root 436 436 2 0 0 mtk_lpm_monitor_thread 0 D LPM-1
> root 437 437 2 0 0 mtk_lpm_monitor_thread 0 D LPM-2
> root 438 438 2 0 0 mtk_lpm_monitor_thread 0 D LPM-3
> root 439 439 2 0 0 mtk_lpm_monitor_thread 0 D LPM-4
> root 440 440 2 0 0 mtk_lpm_monitor_thread 0 D LPM-5
> root 441 441 2 0 0 mtk_lpm_monitor_thread 0 D LPM-6
> root 442 442 2 0 0 mtk_lpm_monitor_thread 0 D LPM-7
A similar approch has been proposed in the past (sorry I do not have
links handy) and always deemed a wrong way to approach the problem.
Either those kernel threads should be fixed to use less sleep or
annotate the sleep properly (TASK_IDLE).
> Signed-off-by: chenguanyou <chenguanyou@xiaomi.com>
> ---
> Documentation/admin-guide/sysctl/kernel.rst | 10 ++++++++++
> include/linux/sched/sysctl.h | 1 +
> kernel/hung_task.c | 8 ++++++++
> kernel/sysctl.c | 9 +++++++++
> lib/Kconfig.debug | 15 +++++++++++++++
> 5 files changed, 43 insertions(+)
>
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 68b21395a743..3c7c74b26d95 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -405,6 +405,16 @@ This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
>
> -1: report an infinite number of warnings.
>
> +hung_task_filter_kthread
> +========================
> +
> +We should skip kthread when a hung task is detected.
> +This file shows up if ``CONFIG_DEFAULT_HUNG_TASK_FILTER_KTHREAD`` is enabled.
> +
> += =========================================================
> +0 Not skip detect kthread.
> +1 Skip detect kthread.
> += =========================================================
>
> hyperv_record_panic_msg
> =======================
> diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
> index db2c0f34aaaf..2b8b01b57559 100644
> --- a/include/linux/sched/sysctl.h
> +++ b/include/linux/sched/sysctl.h
> @@ -19,6 +19,7 @@ extern unsigned int sysctl_hung_task_panic;
> extern unsigned long sysctl_hung_task_timeout_secs;
> extern unsigned long sysctl_hung_task_check_interval_secs;
> extern int sysctl_hung_task_warnings;
> +extern unsigned int sysctl_hung_task_filter_kthread;
> int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
> void *buffer, size_t *lenp, loff_t *ppos);
> #else
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 396ebaebea3f..74ad75c2dde8 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -48,6 +48,11 @@ unsigned long __read_mostly sysctl_hung_task_timeout_secs = CONFIG_DEFAULT_HUNG_
> */
> unsigned long __read_mostly sysctl_hung_task_check_interval_secs;
>
> +/*
> + * Non-zero means no checking kthread
> + */
> +unsigned int __read_mostly sysctl_hung_task_filter_kthread = CONFIG_DEFAULT_HUNG_TASK_FILTER_KTHREAD;
> +
> int __read_mostly sysctl_hung_task_warnings = 10;
>
> static int __read_mostly did_panic;
> @@ -88,6 +93,9 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
> {
> unsigned long switch_count = t->nvcsw + t->nivcsw;
>
> + if (unlikely(sysctl_hung_task_filter_kthread && t->flags & PF_KTHREAD))
> + return;
> +
> /*
> * Ensure the task is not frozen.
> * Also, skip vfork and any other user process that freezer should skip.
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index d4a78e08f6d8..62067b9db486 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -2513,6 +2513,15 @@ static struct ctl_table kern_table[] = {
> .proc_handler = proc_dointvec_minmax,
> .extra1 = &neg_one,
> },
> + {
> + .procname = "hung_task_filter_kthread",
> + .data = &sysctl_hung_task_filter_kthread,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec_minmax,
> + .extra1 = SYSCTL_ZERO,
> + .extra2 = SYSCTL_ONE,
> + },
> #endif
> #ifdef CONFIG_RT_MUTEXES
> {
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 678c13967580..d7063f955987 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1110,6 +1110,21 @@ config DEFAULT_HUNG_TASK_TIMEOUT
> A timeout of 0 disables the check. The default is two minutes.
> Keeping the default should be fine in most cases.
>
> +config DEFAULT_HUNG_TASK_FILTER_KTHREAD
> + int "Default filter kthread for hung task"
> + depends on DETECT_HUNG_TASK
> + range 0 1
> + default 0
> + help
> + This option controls filter kthread uses to determine when
> + a kernel task has become "state=TASK_UNINTERRUPTIBLE" and should be skipped.
> +
> + It can be adjusted at runtime via the kernel.hung_task_filter_kthread
> + sysctl or by writing a value to
> + /proc/sys/kernel/hung_task_filter_kthread.
> +
> + A filter of 1 disables the check.
> +
> config BOOTPARAM_HUNG_TASK_PANIC
> bool "Panic (Reboot) On Hung Tasks"
> depends on DETECT_HUNG_TASK
> --
> 2.17.1
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re:[PATCH v2] hungtask: add filter kthread
2021-08-05 14:11 ` Michal Hocko
@ 2021-08-05 14:47 ` chenguanyou
2021-08-05 23:43 ` [PATCH " Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: chenguanyou @ 2021-08-05 14:47 UTC (permalink / raw)
To: mhocko
Cc: akpm, chenguanyou9338, chenguanyou, gpiccoli, keescook,
linux-kernel, lukas.bulwahn, vbabka
> Either those kernel threads should be fixed to use less sleep or
> annotate the sleep properly (TASK_IDLE).
The API for debugging when we no need care kernel threads state.
Guanyou.Chen
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] hungtask: add filter kthread
2021-08-05 14:47 ` chenguanyou
@ 2021-08-05 23:43 ` Andrew Morton
2021-08-07 13:16 ` chenguanyou
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2021-08-05 23:43 UTC (permalink / raw)
To: chenguanyou
Cc: mhocko, chenguanyou, gpiccoli, keescook, linux-kernel,
lukas.bulwahn, vbabka
On Thu, 5 Aug 2021 22:47:20 +0800 chenguanyou <chenguanyou9338@gmail.com> wrote:
> > Either those kernel threads should be fixed to use less sleep or
> > annotate the sleep properly (TASK_IDLE).
>
> The API for debugging when we no need care kernel threads state.
>
Please explain this point in more detail?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re:[PATCH v2] hungtask: add filter kthread
2021-08-05 23:43 ` [PATCH " Andrew Morton
@ 2021-08-07 13:16 ` chenguanyou
2021-08-09 9:17 ` [PATCH " Michal Hocko
0 siblings, 1 reply; 9+ messages in thread
From: chenguanyou @ 2021-08-07 13:16 UTC (permalink / raw)
To: akpm
Cc: chenguanyou9338, chenguanyou, gpiccoli, keescook, linux-kernel,
lukas.bulwahn, mhocko, vbabka
> Please explain this point in more detail?
In my work,the rootcause of more deadlocks often occurs in user threads.
exp:
PID: 711 TASK: ffffffc13eb71d00 CPU: 0 COMMAND: "sensors@2.0-ser"
#0 [ffffff80251cbcb0] __switch_to at ffffff80080866c4
#1 [ffffff80251cbd20] __schedule at ffffff80090c0940
#2 [ffffff80251cbd80] schedule_preempt_disabled at ffffff80090c0e4c
#3 [ffffff80251cbde0] __mutex_lock at ffffff80090c2e58
#4 [ffffff80251cbe40] __mutex_lock_slowpath at ffffff80090c1f78
#5 [ffffff80251cbe50] mutex_lock at ffffff80090c1f60
#6 [ffffff80251cbe60] __fdget_pos at ffffff800829ac84
#7 [ffffff80251cbe90] sys_write at ffffff8008270550
#8 [ffffff80251cbff0] el0_svc_naked at ffffff8008083fbc
PID: 843 TASK: ffffffc135832b80 CPU: 2 COMMAND: "sensors@2.0-ser"
#0 [ffffff802554bb30] __switch_to at ffffff80080866c4
#1 [ffffff802554bba0] __schedule at ffffff80090c0940
#2 [ffffff802554bc00] schedule at ffffff80090c0d54
#3 [ffffff802554bc50] xxx_sensor_show at ffffff8008bc043c
#4 [ffffff802554bc80] dev_attr_show at ffffff8008668ce0
#5 [ffffff802554bca0] sysfs_kf_seq_show at ffffff8008314e04
#6 [ffffff802554bce0] kernfs_seq_show at ffffff8008314314
#7 [ffffff802554bd10] seq_read at ffffff80082a250c
#8 [ffffff802554bd70] kernfs_fop_read at ffffff80083135b8
#9 [ffffff802554be20] __vfs_read at ffffff800826fcc0
#10 [ffffff802554be40] vfs_read at ffffff800826ff08
#11 [ffffff802554be90] sys_read at ffffff80082704c4
#12 [ffffff802554bff0] el0_svc_naked at ffffff8008083fbc
The rootcause is deadlock caused by using same fd, and 843's file ops is block type;
If we want to trigger panic in the first time through hungtask,
must be avoid detect kernel threads on some platforms("mediatek"),
because they("kernel threads") cause misjudgments.
Guanyou.Chen
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] hungtask: add filter kthread
2021-08-07 13:16 ` chenguanyou
@ 2021-08-09 9:17 ` Michal Hocko
2021-08-09 11:52 ` chenguanyou
0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2021-08-09 9:17 UTC (permalink / raw)
To: chenguanyou
Cc: akpm, chenguanyou, gpiccoli, keescook, linux-kernel,
lukas.bulwahn, vbabka
On Sat 07-08-21 21:16:00, chenguanyou wrote:
> > Please explain this point in more detail?
>
> In my work,the rootcause of more deadlocks often occurs in user threads.
>
> exp:
> PID: 711 TASK: ffffffc13eb71d00 CPU: 0 COMMAND: "sensors@2.0-ser"
> #0 [ffffff80251cbcb0] __switch_to at ffffff80080866c4
> #1 [ffffff80251cbd20] __schedule at ffffff80090c0940
> #2 [ffffff80251cbd80] schedule_preempt_disabled at ffffff80090c0e4c
> #3 [ffffff80251cbde0] __mutex_lock at ffffff80090c2e58
> #4 [ffffff80251cbe40] __mutex_lock_slowpath at ffffff80090c1f78
> #5 [ffffff80251cbe50] mutex_lock at ffffff80090c1f60
> #6 [ffffff80251cbe60] __fdget_pos at ffffff800829ac84
> #7 [ffffff80251cbe90] sys_write at ffffff8008270550
> #8 [ffffff80251cbff0] el0_svc_naked at ffffff8008083fbc
>
> PID: 843 TASK: ffffffc135832b80 CPU: 2 COMMAND: "sensors@2.0-ser"
> #0 [ffffff802554bb30] __switch_to at ffffff80080866c4
> #1 [ffffff802554bba0] __schedule at ffffff80090c0940
> #2 [ffffff802554bc00] schedule at ffffff80090c0d54
> #3 [ffffff802554bc50] xxx_sensor_show at ffffff8008bc043c
> #4 [ffffff802554bc80] dev_attr_show at ffffff8008668ce0
> #5 [ffffff802554bca0] sysfs_kf_seq_show at ffffff8008314e04
> #6 [ffffff802554bce0] kernfs_seq_show at ffffff8008314314
> #7 [ffffff802554bd10] seq_read at ffffff80082a250c
> #8 [ffffff802554bd70] kernfs_fop_read at ffffff80083135b8
> #9 [ffffff802554be20] __vfs_read at ffffff800826fcc0
> #10 [ffffff802554be40] vfs_read at ffffff800826ff08
> #11 [ffffff802554be90] sys_read at ffffff80082704c4
> #12 [ffffff802554bff0] el0_svc_naked at ffffff8008083fbc
>
> The rootcause is deadlock caused by using same fd, and 843's file ops is block type;
> If we want to trigger panic in the first time through hungtask,
> must be avoid detect kernel threads on some platforms("mediatek"),
> because they("kernel threads") cause misjudgments.
This still suggests that the primary purpose of the interface is to
paper over real problems that should be fixed instead.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re:[PATCH v2] hungtask: add filter kthread
2021-08-09 9:17 ` [PATCH " Michal Hocko
@ 2021-08-09 11:52 ` chenguanyou
2021-08-09 11:57 ` [PATCH " Michal Hocko
0 siblings, 1 reply; 9+ messages in thread
From: chenguanyou @ 2021-08-09 11:52 UTC (permalink / raw)
To: mhocko
Cc: akpm, chenguanyou9338, chenguanyou, gpiccoli, keescook,
linux-kernel, lukas.bulwahn, vbabka
> This still suggests that the primary purpose of the interface is to
> paper over real problems that should be fixed instead.
I know, but i don't care kernel threads state because of it doesn't neet to fixed.
The API only for debugging.
Guanyou.Chen
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] hungtask: add filter kthread
2021-08-09 11:52 ` chenguanyou
@ 2021-08-09 11:57 ` Michal Hocko
2021-08-09 12:24 ` chenguanyou
0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2021-08-09 11:57 UTC (permalink / raw)
To: chenguanyou
Cc: akpm, chenguanyou, gpiccoli, keescook, linux-kernel,
lukas.bulwahn, vbabka
On Mon 09-08-21 19:52:38, chenguanyou wrote:
> > This still suggests that the primary purpose of the interface is to
> > paper over real problems that should be fixed instead.
>
> I know, but i don't care kernel threads state because of it doesn't neet to fixed.
> The API only for debugging.
Then I am afraid you have to keep this a local non-upstream feature in
your kernel. This doesn't look like an upstream material to me.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re:[PATCH v2] hungtask: add filter kthread
2021-08-09 11:57 ` [PATCH " Michal Hocko
@ 2021-08-09 12:24 ` chenguanyou
0 siblings, 0 replies; 9+ messages in thread
From: chenguanyou @ 2021-08-09 12:24 UTC (permalink / raw)
To: mhocko
Cc: akpm, chenguanyou9338, chenguanyou, gpiccoli, keescook,
linux-kernel, lukas.bulwahn, vbabka
> Then I am afraid you have to keep this a local non-upstream feature in
> your kernel. This doesn't look like an upstream material to me.
Thank you for your reply.
Guanyou.Chen
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-08-09 12:25 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-05 13:47 [PATCH v2] hungtask: add filter kthread chenguanyou
2021-08-05 14:11 ` Michal Hocko
2021-08-05 14:47 ` chenguanyou
2021-08-05 23:43 ` [PATCH " Andrew Morton
2021-08-07 13:16 ` chenguanyou
2021-08-09 9:17 ` [PATCH " Michal Hocko
2021-08-09 11:52 ` chenguanyou
2021-08-09 11:57 ` [PATCH " Michal Hocko
2021-08-09 12:24 ` chenguanyou
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).