Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
* PROBLEM: Long Workqueue delays V2
@ 2020-08-18 11:58 Jim Baxter
  2020-08-18 16:45 ` Jim Baxter
  2020-08-19 13:12 ` Jim Baxter
  0 siblings, 2 replies; 5+ messages in thread
From: Jim Baxter @ 2020-08-18 11:58 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, linux-mm
  Cc: Resch Carsten (CM/ESO6), Rosca, Eugeniu (ADITG/ESB)

I am asking this question again to include the fs-devel list.


We have issues with the workqueue of the kernel overloading the CPU 0 
when we we disconnect a USB stick.

This results in other items on the shared workqueue being delayed by
around 6.5 seconds with a default kernel configuration and 2.3 seconds
on a config tailored for our RCar embedded platform.



We first noticed this issue on custom hardware and we have recreated it
on an RCar Starter Kit using a test module [1] to replicate the
behaviour, the test module outputs any delays of greater then 9ms.

To run the test we have a 4GB random file on a USB stick and perform
the following test.
The stick is mounted as R/O and we are copying data from the stick:

- Mount the stick.
mount -o ro,remount /dev/sda1

- Load the Module:
# taskset -c 0 modprobe latency-mon

- Copy large amount of data from the stick:
# dd if=/run/media/sda1/sample.txt of=/dev/zero
[ 1437.517603] DELAY: 10
8388607+1 records in
8388607+1 records out


- Disconnect the USB stick:
[ 1551.796792] usb 2-1: USB disconnect, device number 2
[ 1558.625517] DELAY: 6782


The Delay output 6782 is in milliseconds.



Using umount stops the issue occurring but is unfortunately not guaranteed
in our particular system.


From my analysis the hub_event workqueue kworker/0:1+usb thread uses around
98% of the CPU.

I have traced the workqueue:workqueue_queue_work function while unplugging the USB
and there is no particular workqueue function being executed a lot more then the 
others for the kworker/0:1+usb thread.


Using perf I identified the hub_events workqueue was spending a lot of time in
invalidate_partition(), I have included a cut down the captured data from perf in
[2] which shows the additional functions where the kworker spends most of its time.


I am aware there will be delays on the shared workqueue, are the delays
we are seeing considered normal?


Is there any way to mitigate or identify where the delay is?
I am unsure if this is a memory or filesystem subsystem issue.


Thank you for you help.

Thanks,
Jim Baxter

[1] Test Module:
// SPDX-License-Identifier: GPL-2.0
/*
 * Simple WQ latency monitoring
 *
 * Copyright (C) 2020 Advanced Driver Information Technology.
 */

#include <linux/init.h>
#include <linux/ktime.h>
#include <linux/module.h>

#define PERIOD_MS 100

static struct delayed_work wq;
static u64 us_save;

static void wq_cb(struct work_struct *work)
{
	u64 us = ktime_to_us(ktime_get());
	u64 us_diff = us - us_save;
	u64 us_print = 0;

	if (!us_save)
		goto skip_print;


	us_print = us_diff / 1000 - PERIOD_MS;
	if (us_print > 9)
		pr_crit("DELAY: %lld\n", us_print);

skip_print:
	us_save = us;
	schedule_delayed_work(&wq, msecs_to_jiffies(PERIOD_MS));
}

static int latency_mon_init(void)
{
	us_save = 0;
	INIT_DELAYED_WORK(&wq, wq_cb);
	schedule_delayed_work(&wq, msecs_to_jiffies(PERIOD_MS));

	return 0;
}

static void latency_mon_exit(void)
{
	cancel_delayed_work_sync(&wq);
	pr_info("%s\n", __func__);
}

module_init(latency_mon_init);
module_exit(latency_mon_exit);
MODULE_AUTHOR("Eugeniu Rosca <erosca@de.adit-jv.com>");
MODULE_LICENSE("GPL");


[2] perf trace:
    95.22%     0.00%  kworker/0:2-eve  [kernel.kallsyms]
    |
    ---ret_from_fork
       kthread
       worker_thread
       |          
        --95.15%--process_one_work
		  |          
		   --94.99%--hub_event
			 |          
			  --94.99%--usb_disconnect
			  <snip>
				|  
				--94.90%--invalidate_partition
				   __invalidate_device
				   |          
				   |--64.55%--invalidate_bdev
				   |  |          
				   |   --64.13%--invalidate_mapping_pages
				   |     |          
				   |     |--24.09%--invalidate_inode_page
				   |     |   |          
				   |     |   --23.44%--remove_mapping
				   |     |     |          
				   |     |      --23.20%--__remove_mapping
				   |     |        |          
				   |     |         --21.90%--arch_local_irq_restore
				   |     |          
				   |     |--22.44%--arch_local_irq_enable
				   |          
					--30.35%--shrink_dcache_sb 
					<snip>
					  |      
					  --30.17%--truncate_inode_pages_range


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: Long Workqueue delays V2
  2020-08-18 11:58 PROBLEM: Long Workqueue delays V2 Jim Baxter
@ 2020-08-18 16:45 ` Jim Baxter
  2020-08-19 13:12 ` Jim Baxter
  1 sibling, 0 replies; 5+ messages in thread
From: Jim Baxter @ 2020-08-18 16:45 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, linux-mm, linux-mm
  Cc: Resch Carsten (CM/ESO6), Rosca, Eugeniu (ADITG/ESB)

Added linux-block List which may also be relevant to this issue.

On 18 Aug 2020 12:58, Jim Baxter wrote:

> I am asking this question again to include the fs-devel list.
> 
> 
> We have issues with the workqueue of the kernel overloading the CPU 0 
> when we we disconnect a USB stick.
> 
> This results in other items on the shared workqueue being delayed by
> around 6.5 seconds with a default kernel configuration and 2.3 seconds
> on a config tailored for our RCar embedded platform.
> 
> 
> 
> We first noticed this issue on custom hardware and we have recreated it
> on an RCar Starter Kit using a test module [1] to replicate the
> behaviour, the test module outputs any delays of greater then 9ms.
> 
> To run the test we have a 4GB random file on a USB stick and perform
> the following test.
> The stick is mounted as R/O and we are copying data from the stick:
> 
> - Mount the stick.
> mount -o ro,remount /dev/sda1
> 
> - Load the Module:
> # taskset -c 0 modprobe latency-mon
> 
> - Copy large amount of data from the stick:
> # dd if=/run/media/sda1/sample.txt of=/dev/zero
> [ 1437.517603] DELAY: 10
> 8388607+1 records in
> 8388607+1 records out
> 
> 
> - Disconnect the USB stick:
> [ 1551.796792] usb 2-1: USB disconnect, device number 2
> [ 1558.625517] DELAY: 6782
> 
> 
> The Delay output 6782 is in milliseconds.
> 
> 
> 
> Using umount stops the issue occurring but is unfortunately not guaranteed
> in our particular system.
> 
> 
> From my analysis the hub_event workqueue kworker/0:1+usb thread uses around
> 98% of the CPU.
> 
> I have traced the workqueue:workqueue_queue_work function while unplugging the USB
> and there is no particular workqueue function being executed a lot more then the 
> others for the kworker/0:1+usb thread.
> 
> 
> Using perf I identified the hub_events workqueue was spending a lot of time in
> invalidate_partition(), I have included a cut down the captured data from perf in
> [2] which shows the additional functions where the kworker spends most of its time.
> 
> 
> I am aware there will be delays on the shared workqueue, are the delays
> we are seeing considered normal?
> 
> 
> Is there any way to mitigate or identify where the delay is?
> I am unsure if this is a memory or filesystem subsystem issue.
> 
> 
> Thank you for you help.
> 
> Thanks,
> Jim Baxter
> 
> [1] Test Module:
> // SPDX-License-Identifier: GPL-2.0
> /*
>  * Simple WQ latency monitoring
>  *
>  * Copyright (C) 2020 Advanced Driver Information Technology.
>  */
> 
> #include <linux/init.h>
> #include <linux/ktime.h>
> #include <linux/module.h>
> 
> #define PERIOD_MS 100
> 
> static struct delayed_work wq;
> static u64 us_save;
> 
> static void wq_cb(struct work_struct *work)
> {
> 	u64 us = ktime_to_us(ktime_get());
> 	u64 us_diff = us - us_save;
> 	u64 us_print = 0;
> 
> 	if (!us_save)
> 		goto skip_print;
> 
> 
> 	us_print = us_diff / 1000 - PERIOD_MS;
> 	if (us_print > 9)
> 		pr_crit("DELAY: %lld\n", us_print);
> 
> skip_print:
> 	us_save = us;
> 	schedule_delayed_work(&wq, msecs_to_jiffies(PERIOD_MS));
> }
> 
> static int latency_mon_init(void)
> {
> 	us_save = 0;
> 	INIT_DELAYED_WORK(&wq, wq_cb);
> 	schedule_delayed_work(&wq, msecs_to_jiffies(PERIOD_MS));
> 
> 	return 0;
> }
> 
> static void latency_mon_exit(void)
> {
> 	cancel_delayed_work_sync(&wq);
> 	pr_info("%s\n", __func__);
> }
> 
> module_init(latency_mon_init);
> module_exit(latency_mon_exit);
> MODULE_AUTHOR("Eugeniu Rosca <erosca@de.adit-jv.com>");
> MODULE_LICENSE("GPL");
> 
> 
> [2] perf trace:
>     95.22%     0.00%  kworker/0:2-eve  [kernel.kallsyms]
>     |
>     ---ret_from_fork
>        kthread
>        worker_thread
>        |          
>         --95.15%--process_one_work
> 		  |          
> 		   --94.99%--hub_event
> 			 |          
> 			  --94.99%--usb_disconnect
> 			  <snip>
> 				|  
> 				--94.90%--invalidate_partition
> 				   __invalidate_device
> 				   |          
> 				   |--64.55%--invalidate_bdev
> 				   |  |          
> 				   |   --64.13%--invalidate_mapping_pages
> 				   |     |          
> 				   |     |--24.09%--invalidate_inode_page
> 				   |     |   |          
> 				   |     |   --23.44%--remove_mapping
> 				   |     |     |          
> 				   |     |      --23.20%--__remove_mapping
> 				   |     |        |          
> 				   |     |         --21.90%--arch_local_irq_restore
> 				   |     |          
> 				   |     |--22.44%--arch_local_irq_enable
> 				   |          
> 					--30.35%--shrink_dcache_sb 
> 					<snip>
> 					  |      
> 					  --30.17%--truncate_inode_pages_range
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: Long Workqueue delays V2
  2020-08-18 11:58 PROBLEM: Long Workqueue delays V2 Jim Baxter
  2020-08-18 16:45 ` Jim Baxter
@ 2020-08-19 13:12 ` Jim Baxter
  2020-08-27 12:06   ` Jim Baxter
  1 sibling, 1 reply; 5+ messages in thread
From: Jim Baxter @ 2020-08-19 13:12 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, linux-mm, linux-block
  Cc: Resch Carsten (CM/ESO6), Rosca, Eugeniu (ADITG/ESB)

Added linux-block List which may also be relevant to this issue.

-------- Original Message --------
Subject: PROBLEM: Long Workqueue delays V2
From: Jim Baxter <jim_baxter@mentor.com>
To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org
CC: "Resch Carsten (CM/ESO6)" <Carsten.Resch@de.bosch.com>, "Rosca, Eugeniu (ADITG/ESB)" <erosca@de.adit-jv.com>
Date: Tue, 18 Aug 2020 12:58:13 +0100

> I am asking this question again to include the fs-devel list.
> 
> 
> We have issues with the workqueue of the kernel overloading the CPU 0 
> when we we disconnect a USB stick.
> 
> This results in other items on the shared workqueue being delayed by
> around 6.5 seconds with a default kernel configuration and 2.3 seconds
> on a config tailored for our RCar embedded platform.
> 
> 
> 
> We first noticed this issue on custom hardware and we have recreated it
> on an RCar Starter Kit using a test module [1] to replicate the
> behaviour, the test module outputs any delays of greater then 9ms.
> 
> To run the test we have a 4GB random file on a USB stick and perform
> the following test.
> The stick is mounted as R/O and we are copying data from the stick:
> 
> - Mount the stick.
> mount -o ro,remount /dev/sda1
> 
> - Load the Module:
> # taskset -c 0 modprobe latency-mon
> 
> - Copy large amount of data from the stick:
> # dd if=/run/media/sda1/sample.txt of=/dev/zero
> [ 1437.517603] DELAY: 10
> 8388607+1 records in
> 8388607+1 records out
> 
> 
> - Disconnect the USB stick:
> [ 1551.796792] usb 2-1: USB disconnect, device number 2
> [ 1558.625517] DELAY: 6782
> 
> 
> The Delay output 6782 is in milliseconds.
> 
> 
> 
> Using umount stops the issue occurring but is unfortunately not guaranteed
> in our particular system.
> 
> 
> From my analysis the hub_event workqueue kworker/0:1+usb thread uses around
> 98% of the CPU.
> 
> I have traced the workqueue:workqueue_queue_work function while unplugging the USB
> and there is no particular workqueue function being executed a lot more then the 
> others for the kworker/0:1+usb thread.
> 
> 
> Using perf I identified the hub_events workqueue was spending a lot of time in
> invalidate_partition(), I have included a cut down the captured data from perf in
> [2] which shows the additional functions where the kworker spends most of its time.
> 
> 
> I am aware there will be delays on the shared workqueue, are the delays
> we are seeing considered normal?
> 
> 
> Is there any way to mitigate or identify where the delay is?
> I am unsure if this is a memory or filesystem subsystem issue.
> 
> 
> Thank you for you help.
> 
> Thanks,
> Jim Baxter
> 
> [1] Test Module:
> // SPDX-License-Identifier: GPL-2.0
> /*
>  * Simple WQ latency monitoring
>  *
>  * Copyright (C) 2020 Advanced Driver Information Technology.
>  */
> 
> #include <linux/init.h>
> #include <linux/ktime.h>
> #include <linux/module.h>
> 
> #define PERIOD_MS 100
> 
> static struct delayed_work wq;
> static u64 us_save;
> 
> static void wq_cb(struct work_struct *work)
> {
> 	u64 us = ktime_to_us(ktime_get());
> 	u64 us_diff = us - us_save;
> 	u64 us_print = 0;
> 
> 	if (!us_save)
> 		goto skip_print;
> 
> 
> 	us_print = us_diff / 1000 - PERIOD_MS;
> 	if (us_print > 9)
> 		pr_crit("DELAY: %lld\n", us_print);
> 
> skip_print:
> 	us_save = us;
> 	schedule_delayed_work(&wq, msecs_to_jiffies(PERIOD_MS));
> }
> 
> static int latency_mon_init(void)
> {
> 	us_save = 0;
> 	INIT_DELAYED_WORK(&wq, wq_cb);
> 	schedule_delayed_work(&wq, msecs_to_jiffies(PERIOD_MS));
> 
> 	return 0;
> }
> 
> static void latency_mon_exit(void)
> {
> 	cancel_delayed_work_sync(&wq);
> 	pr_info("%s\n", __func__);
> }
> 
> module_init(latency_mon_init);
> module_exit(latency_mon_exit);
> MODULE_AUTHOR("Eugeniu Rosca <erosca@de.adit-jv.com>");
> MODULE_LICENSE("GPL");
> 
> 
> [2] perf trace:
>     95.22%     0.00%  kworker/0:2-eve  [kernel.kallsyms]
>     |
>     ---ret_from_fork
>        kthread
>        worker_thread
>        |          
>         --95.15%--process_one_work
> 		  |          
> 		   --94.99%--hub_event
> 			 |          
> 			  --94.99%--usb_disconnect
> 			  <snip>
> 				|  
> 				--94.90%--invalidate_partition
> 				   __invalidate_device
> 				   |          
> 				   |--64.55%--invalidate_bdev
> 				   |  |          
> 				   |   --64.13%--invalidate_mapping_pages
> 				   |     |          
> 				   |     |--24.09%--invalidate_inode_page
> 				   |     |   |          
> 				   |     |   --23.44%--remove_mapping
> 				   |     |     |          
> 				   |     |      --23.20%--__remove_mapping
> 				   |     |        |          
> 				   |     |         --21.90%--arch_local_irq_restore
> 				   |     |          
> 				   |     |--22.44%--arch_local_irq_enable
> 				   |          
> 					--30.35%--shrink_dcache_sb 
> 					<snip>
> 					  |      
> 					  --30.17%--truncate_inode_pages_range
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: Long Workqueue delays V2
  2020-08-19 13:12 ` Jim Baxter
@ 2020-08-27 12:06   ` Jim Baxter
  2020-09-08 15:54     ` Vlastimil Babka
  0 siblings, 1 reply; 5+ messages in thread
From: Jim Baxter @ 2020-08-27 12:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, linux-mm, linux-block
  Cc: linux-usb, Frkuska, Joshua, Resch Carsten (CM/ESO6),
	Rosca, Eugeniu (ADITG/ESB),
	Craske, Mark, Brown, Michael

Has anyone any ideas of how to investigate this delay further?

Comparing the perf output for unplugging the USB stick and using umount
which does not cause these delays in other workqueues the main difference
is that the problem case is executing the code in invalidate_mapping_pages()
and a large part of that arch_local_irq_restore() which is part of
releasing a lock, I would usually expect that requesting a lock would be
where delays may occur.

	--94.90%--invalidate_partition
	   __invalidate_device
	   |          
	   |--64.55%--invalidate_bdev
	   |  |          
	   |   --64.13%--invalidate_mapping_pages
	   |     |          
	   |     |--24.09%--invalidate_inode_page
	   |     |   |          
	   |     |   --23.44%--remove_mapping
	   |     |     |          
	   |     |      --23.20%--__remove_mapping
	   |     |        |          
	   |     |         --21.90%--arch_local_irq_restore
	   |     |          
	   |     |--22.44%--arch_local_irq_enable

Best regards,
Jim

-------- Original Message --------
Subject: Re: PROBLEM: Long Workqueue delays V2
From: Jim Baxter <jim_baxter@mentor.com>
To: 
Date: Wed Aug 19 2020 14:12:24 GMT+0100 (British Summer Time)

> Added linux-block List which may also be relevant to this issue.
> 
> -------- Original Message --------
> Subject: PROBLEM: Long Workqueue delays V2
> From: Jim Baxter <jim_baxter@mentor.com>
> To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org
> CC: "Resch Carsten (CM/ESO6)" <Carsten.Resch@de.bosch.com>, "Rosca, Eugeniu (ADITG/ESB)" <erosca@de.adit-jv.com>
> Date: Tue, 18 Aug 2020 12:58:13 +0100
> 
>> I am asking this question again to include the fs-devel list.
>>
>>
>> We have issues with the workqueue of the kernel overloading the CPU 0 
>> when we we disconnect a USB stick.
>>
>> This results in other items on the shared workqueue being delayed by
>> around 6.5 seconds with a default kernel configuration and 2.3 seconds
>> on a config tailored for our RCar embedded platform.
>>
>>
>>
>> We first noticed this issue on custom hardware and we have recreated it
>> on an RCar Starter Kit using a test module [1] to replicate the
>> behaviour, the test module outputs any delays of greater then 9ms.
>>
>> To run the test we have a 4GB random file on a USB stick and perform
>> the following test.
>> The stick is mounted as R/O and we are copying data from the stick:
>>
>> - Mount the stick.
>> mount -o ro,remount /dev/sda1
>>
>> - Load the Module:
>> # taskset -c 0 modprobe latency-mon
>>
>> - Copy large amount of data from the stick:
>> # dd if=/run/media/sda1/sample.txt of=/dev/zero
>> [ 1437.517603] DELAY: 10
>> 8388607+1 records in
>> 8388607+1 records out
>>
>>
>> - Disconnect the USB stick:
>> [ 1551.796792] usb 2-1: USB disconnect, device number 2
>> [ 1558.625517] DELAY: 6782
>>
>>
>> The Delay output 6782 is in milliseconds.
>>
>>
>>
>> Using umount stops the issue occurring but is unfortunately not guaranteed
>> in our particular system.
>>
>>
>> From my analysis the hub_event workqueue kworker/0:1+usb thread uses around
>> 98% of the CPU.
>>
>> I have traced the workqueue:workqueue_queue_work function while unplugging the USB
>> and there is no particular workqueue function being executed a lot more then the 
>> others for the kworker/0:1+usb thread.
>>
>>
>> Using perf I identified the hub_events workqueue was spending a lot of time in
>> invalidate_partition(), I have included a cut down the captured data from perf in
>> [2] which shows the additional functions where the kworker spends most of its time.
>>
>>
>> I am aware there will be delays on the shared workqueue, are the delays
>> we are seeing considered normal?
>>
>>
>> Is there any way to mitigate or identify where the delay is?
>> I am unsure if this is a memory or filesystem subsystem issue.
>>
>>
>> Thank you for you help.
>>
>> Thanks,
>> Jim Baxter
>>
>> [1] Test Module:
>> // SPDX-License-Identifier: GPL-2.0
>> /*
>>  * Simple WQ latency monitoring
>>  *
>>  * Copyright (C) 2020 Advanced Driver Information Technology.
>>  */
>>
>> #include <linux/init.h>
>> #include <linux/ktime.h>
>> #include <linux/module.h>
>>
>> #define PERIOD_MS 100
>>
>> static struct delayed_work wq;
>> static u64 us_save;
>>
>> static void wq_cb(struct work_struct *work)
>> {
>> 	u64 us = ktime_to_us(ktime_get());
>> 	u64 us_diff = us - us_save;
>> 	u64 us_print = 0;
>>
>> 	if (!us_save)
>> 		goto skip_print;
>>
>>
>> 	us_print = us_diff / 1000 - PERIOD_MS;
>> 	if (us_print > 9)
>> 		pr_crit("DELAY: %lld\n", us_print);
>>
>> skip_print:
>> 	us_save = us;
>> 	schedule_delayed_work(&wq, msecs_to_jiffies(PERIOD_MS));
>> }
>>
>> static int latency_mon_init(void)
>> {
>> 	us_save = 0;
>> 	INIT_DELAYED_WORK(&wq, wq_cb);
>> 	schedule_delayed_work(&wq, msecs_to_jiffies(PERIOD_MS));
>>
>> 	return 0;
>> }
>>
>> static void latency_mon_exit(void)
>> {
>> 	cancel_delayed_work_sync(&wq);
>> 	pr_info("%s\n", __func__);
>> }
>>
>> module_init(latency_mon_init);
>> module_exit(latency_mon_exit);
>> MODULE_AUTHOR("Eugeniu Rosca <erosca@de.adit-jv.com>");
>> MODULE_LICENSE("GPL");
>>
>>
>> [2] perf trace:
>>     95.22%     0.00%  kworker/0:2-eve  [kernel.kallsyms]
>>     |
>>     ---ret_from_fork
>>        kthread
>>        worker_thread
>>        |          
>>         --95.15%--process_one_work
>> 		  |          
>> 		   --94.99%--hub_event
>> 			 |          
>> 			  --94.99%--usb_disconnect
>> 			  <snip>
>> 				|  
>> 				--94.90%--invalidate_partition
>> 				   __invalidate_device
>> 				   |          
>> 				   |--64.55%--invalidate_bdev
>> 				   |  |          
>> 				   |   --64.13%--invalidate_mapping_pages
>> 				   |     |          
>> 				   |     |--24.09%--invalidate_inode_page
>> 				   |     |   |          
>> 				   |     |   --23.44%--remove_mapping
>> 				   |     |     |          
>> 				   |     |      --23.20%--__remove_mapping
>> 				   |     |        |          
>> 				   |     |         --21.90%--arch_local_irq_restore
>> 				   |     |          
>> 				   |     |--22.44%--arch_local_irq_enable
>> 				   |          
>> 					--30.35%--shrink_dcache_sb 
>> 					<snip>
>> 					  |      
>> 					  --30.17%--truncate_inode_pages_range
>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: Long Workqueue delays V2
  2020-08-27 12:06   ` Jim Baxter
@ 2020-09-08 15:54     ` Vlastimil Babka
  0 siblings, 0 replies; 5+ messages in thread
From: Vlastimil Babka @ 2020-09-08 15:54 UTC (permalink / raw)
  To: Jim Baxter, linux-fsdevel, linux-kernel, linux-mm, linux-block
  Cc: linux-usb, Frkuska, Joshua, Resch Carsten (CM/ESO6),
	Rosca, Eugeniu (ADITG/ESB),
	Craske, Mark, Brown, Michael

On 8/27/20 2:06 PM, Jim Baxter wrote:
> Has anyone any ideas of how to investigate this delay further?
> 
> Comparing the perf output for unplugging the USB stick and using umount
> which does not cause these delays in other workqueues the main difference

I don't have that much insight in this, but isn't it that in case of umount, the
exactly same work is done in the umount process context and not workqueues? So
it might take the same time and cpu, stress the same paths, but as it's
attributed to the process, there are no workqueue delays reported? Or does your
measurements suggest otherwise?

> is that the problem case is executing the code in invalidate_mapping_pages()
> and a large part of that arch_local_irq_restore() which is part of
> releasing a lock, I would usually expect that requesting a lock would be
> where delays may occur.
> 
> 	--94.90%--invalidate_partition
> 	   __invalidate_device
> 	   |          
> 	   |--64.55%--invalidate_bdev
> 	   |  |          
> 	   |   --64.13%--invalidate_mapping_pages
> 	   |     |          
> 	   |     |--24.09%--invalidate_inode_page
> 	   |     |   |          
> 	   |     |   --23.44%--remove_mapping
> 	   |     |     |          
> 	   |     |      --23.20%--__remove_mapping
> 	   |     |        |          
> 	   |     |         --21.90%--arch_local_irq_restore
> 	   |     |          
> 	   |     |--22.44%--arch_local_irq_enable
> 
> Best regards,
> Jim
> 
> -------- Original Message --------
> Subject: Re: PROBLEM: Long Workqueue delays V2
> From: Jim Baxter <jim_baxter@mentor.com>
> To: 
> Date: Wed Aug 19 2020 14:12:24 GMT+0100 (British Summer Time)
> 
>> Added linux-block List which may also be relevant to this issue.
>> 
>> -------- Original Message --------
>> Subject: PROBLEM: Long Workqueue delays V2
>> From: Jim Baxter <jim_baxter@mentor.com>
>> To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org
>> CC: "Resch Carsten (CM/ESO6)" <Carsten.Resch@de.bosch.com>, "Rosca, Eugeniu (ADITG/ESB)" <erosca@de.adit-jv.com>
>> Date: Tue, 18 Aug 2020 12:58:13 +0100
>> 
>>> I am asking this question again to include the fs-devel list.
>>>
>>>
>>> We have issues with the workqueue of the kernel overloading the CPU 0 
>>> when we we disconnect a USB stick.
>>>
>>> This results in other items on the shared workqueue being delayed by
>>> around 6.5 seconds with a default kernel configuration and 2.3 seconds
>>> on a config tailored for our RCar embedded platform.
>>>
>>>
>>>
>>> We first noticed this issue on custom hardware and we have recreated it
>>> on an RCar Starter Kit using a test module [1] to replicate the
>>> behaviour, the test module outputs any delays of greater then 9ms.
>>>
>>> To run the test we have a 4GB random file on a USB stick and perform
>>> the following test.
>>> The stick is mounted as R/O and we are copying data from the stick:
>>>
>>> - Mount the stick.
>>> mount -o ro,remount /dev/sda1
>>>
>>> - Load the Module:
>>> # taskset -c 0 modprobe latency-mon
>>>
>>> - Copy large amount of data from the stick:
>>> # dd if=/run/media/sda1/sample.txt of=/dev/zero
>>> [ 1437.517603] DELAY: 10
>>> 8388607+1 records in
>>> 8388607+1 records out
>>>
>>>
>>> - Disconnect the USB stick:
>>> [ 1551.796792] usb 2-1: USB disconnect, device number 2
>>> [ 1558.625517] DELAY: 6782
>>>
>>>
>>> The Delay output 6782 is in milliseconds.
>>>
>>>
>>>
>>> Using umount stops the issue occurring but is unfortunately not guaranteed
>>> in our particular system.
>>>
>>>
>>> From my analysis the hub_event workqueue kworker/0:1+usb thread uses around
>>> 98% of the CPU.
>>>
>>> I have traced the workqueue:workqueue_queue_work function while unplugging the USB
>>> and there is no particular workqueue function being executed a lot more then the 
>>> others for the kworker/0:1+usb thread.
>>>
>>>
>>> Using perf I identified the hub_events workqueue was spending a lot of time in
>>> invalidate_partition(), I have included a cut down the captured data from perf in
>>> [2] which shows the additional functions where the kworker spends most of its time.
>>>
>>>
>>> I am aware there will be delays on the shared workqueue, are the delays
>>> we are seeing considered normal?
>>>
>>>
>>> Is there any way to mitigate or identify where the delay is?
>>> I am unsure if this is a memory or filesystem subsystem issue.
>>>
>>>
>>> Thank you for you help.
>>>
>>> Thanks,
>>> Jim Baxter
>>>
>>> [1] Test Module:
>>> // SPDX-License-Identifier: GPL-2.0
>>> /*
>>>  * Simple WQ latency monitoring
>>>  *
>>>  * Copyright (C) 2020 Advanced Driver Information Technology.
>>>  */
>>>
>>> #include <linux/init.h>
>>> #include <linux/ktime.h>
>>> #include <linux/module.h>
>>>
>>> #define PERIOD_MS 100
>>>
>>> static struct delayed_work wq;
>>> static u64 us_save;
>>>
>>> static void wq_cb(struct work_struct *work)
>>> {
>>> 	u64 us = ktime_to_us(ktime_get());
>>> 	u64 us_diff = us - us_save;
>>> 	u64 us_print = 0;
>>>
>>> 	if (!us_save)
>>> 		goto skip_print;
>>>
>>>
>>> 	us_print = us_diff / 1000 - PERIOD_MS;
>>> 	if (us_print > 9)
>>> 		pr_crit("DELAY: %lld\n", us_print);
>>>
>>> skip_print:
>>> 	us_save = us;
>>> 	schedule_delayed_work(&wq, msecs_to_jiffies(PERIOD_MS));
>>> }
>>>
>>> static int latency_mon_init(void)
>>> {
>>> 	us_save = 0;
>>> 	INIT_DELAYED_WORK(&wq, wq_cb);
>>> 	schedule_delayed_work(&wq, msecs_to_jiffies(PERIOD_MS));
>>>
>>> 	return 0;
>>> }
>>>
>>> static void latency_mon_exit(void)
>>> {
>>> 	cancel_delayed_work_sync(&wq);
>>> 	pr_info("%s\n", __func__);
>>> }
>>>
>>> module_init(latency_mon_init);
>>> module_exit(latency_mon_exit);
>>> MODULE_AUTHOR("Eugeniu Rosca <erosca@de.adit-jv.com>");
>>> MODULE_LICENSE("GPL");
>>>
>>>
>>> [2] perf trace:
>>>     95.22%     0.00%  kworker/0:2-eve  [kernel.kallsyms]
>>>     |
>>>     ---ret_from_fork
>>>        kthread
>>>        worker_thread
>>>        |          
>>>         --95.15%--process_one_work
>>> 		  |          
>>> 		   --94.99%--hub_event
>>> 			 |          
>>> 			  --94.99%--usb_disconnect
>>> 			  <snip>
>>> 				|  
>>> 				--94.90%--invalidate_partition
>>> 				   __invalidate_device
>>> 				   |          
>>> 				   |--64.55%--invalidate_bdev
>>> 				   |  |          
>>> 				   |   --64.13%--invalidate_mapping_pages
>>> 				   |     |          
>>> 				   |     |--24.09%--invalidate_inode_page
>>> 				   |     |   |          
>>> 				   |     |   --23.44%--remove_mapping
>>> 				   |     |     |          
>>> 				   |     |      --23.20%--__remove_mapping
>>> 				   |     |        |          
>>> 				   |     |         --21.90%--arch_local_irq_restore
>>> 				   |     |          
>>> 				   |     |--22.44%--arch_local_irq_enable
>>> 				   |          
>>> 					--30.35%--shrink_dcache_sb 
>>> 					<snip>
>>> 					  |      
>>> 					  --30.17%--truncate_inode_pages_range
>>>
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-09-08 19:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-18 11:58 PROBLEM: Long Workqueue delays V2 Jim Baxter
2020-08-18 16:45 ` Jim Baxter
2020-08-19 13:12 ` Jim Baxter
2020-08-27 12:06   ` Jim Baxter
2020-09-08 15:54     ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).