Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v2 0/2] iowait and idle fixes in /proc/stat
@ 2020-09-15 19:36 Tom Hromatka
  2020-09-15 19:36 ` [PATCH v2 1/2] tick-sched: Do not clear the iowait and idle times Tom Hromatka
  2020-09-15 19:36 ` [PATCH v2 2/2] /proc/stat: Simplify iowait and idle calculations when cpu is offline Tom Hromatka
  0 siblings, 2 replies; 7+ messages in thread
From: Tom Hromatka @ 2020-09-15 19:36 UTC (permalink / raw)
  To: tom.hromatka, linux-kernel, linux-fsdevel, fweisbec, tglx, mingo,
	adobriyan

A customer is using /proc/stat to track cpu usage in a VM and noted
that the iowait and idle times behave strangely when a cpu goes
offline and comes back online.

This patchset addresses two issues that can cause iowait and idle
to fluctuate up and down.  With these changes, cpu iowait and idle
now only increase.

Changes from v1 to v2:
* Cleaned up commit messages
* Clarified code comments
* Further optimized the logic in fs/proc/stat.c

Tom Hromatka (2):
  tick-sched: Do not clear the iowait and idle times
  /proc/stat: Simplify iowait and idle calculations when cpu is offline

 fs/proc/stat.c           | 26 ++------------------------
 kernel/time/tick-sched.c |  9 +++++++++
 2 files changed, 11 insertions(+), 24 deletions(-)

-- 
2.25.4


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/2] tick-sched: Do not clear the iowait and idle times
  2020-09-15 19:36 [PATCH v2 0/2] iowait and idle fixes in /proc/stat Tom Hromatka
@ 2020-09-15 19:36 ` Tom Hromatka
  2020-09-24 20:41   ` Thomas Gleixner
  2020-09-15 19:36 ` [PATCH v2 2/2] /proc/stat: Simplify iowait and idle calculations when cpu is offline Tom Hromatka
  1 sibling, 1 reply; 7+ messages in thread
From: Tom Hromatka @ 2020-09-15 19:36 UTC (permalink / raw)
  To: tom.hromatka, linux-kernel, linux-fsdevel, fweisbec, tglx, mingo,
	adobriyan

Prior to this commit, the cpu idle and iowait data in /proc/stat were
cleared when a CPU goes down.  When the CPU came back online, both idle
and iowait times were restarted from 0.

This commit preserves the CPU's idle and iowait values when a CPU goes
offline and comes back online.

Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com>
---
 kernel/time/tick-sched.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index f0199a4ba1ad..4fbf67171dde 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1383,13 +1383,22 @@ void tick_setup_sched_timer(void)
 void tick_cancel_sched_timer(int cpu)
 {
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+	ktime_t idle_sleeptime, iowait_sleeptime;
 
 # ifdef CONFIG_HIGH_RES_TIMERS
 	if (ts->sched_timer.base)
 		hrtimer_cancel(&ts->sched_timer);
 # endif
 
+	/*
+	 * Preserve idle and iowait sleep times accross a CPU offline/online
+	 * sequence as they are accumulative.
+	 */
+	idle_sleeptime = ts->idle_sleeptime;
+	iowait_sleeptime = ts->iowait_sleeptime;
 	memset(ts, 0, sizeof(*ts));
+	ts->idle_sleeptime = idle_sleeptime;
+	ts->iowait_sleeptime = iowait_sleeptime;
 }
 #endif
 
-- 
2.25.4


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 2/2] /proc/stat: Simplify iowait and idle calculations when cpu is offline
  2020-09-15 19:36 [PATCH v2 0/2] iowait and idle fixes in /proc/stat Tom Hromatka
  2020-09-15 19:36 ` [PATCH v2 1/2] tick-sched: Do not clear the iowait and idle times Tom Hromatka
@ 2020-09-15 19:36 ` Tom Hromatka
  2020-09-24 21:19   ` Thomas Gleixner
  1 sibling, 1 reply; 7+ messages in thread
From: Tom Hromatka @ 2020-09-15 19:36 UTC (permalink / raw)
  To: tom.hromatka, linux-kernel, linux-fsdevel, fweisbec, tglx, mingo,
	adobriyan

Prior to this commit, the cpu idle and iowait data in /proc/stat used
different data sources based upon whether the CPU was online or not.
This would cause spikes in the cpu idle and iowait data.

This patch uses the same data source, get_cpu_{idle,iowait}_time_us(),
whether the CPU is online or not.

This patch and the preceding patch, "tick-sched: Do not clear the
iowait and idle times", ensure that the cpu idle and iowait data
are always increasing.

Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com>
---
 fs/proc/stat.c | 26 ++------------------------
 1 file changed, 2 insertions(+), 24 deletions(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 46b3293015fe..198f3c50cb91 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -47,34 +47,12 @@ static u64 get_iowait_time(struct kernel_cpustat *kcs, int cpu)
 
 static u64 get_idle_time(struct kernel_cpustat *kcs, int cpu)
 {
-	u64 idle, idle_usecs = -1ULL;
-
-	if (cpu_online(cpu))
-		idle_usecs = get_cpu_idle_time_us(cpu, NULL);
-
-	if (idle_usecs == -1ULL)
-		/* !NO_HZ or cpu offline so we can rely on cpustat.idle */
-		idle = kcs->cpustat[CPUTIME_IDLE];
-	else
-		idle = idle_usecs * NSEC_PER_USEC;
-
-	return idle;
+	return get_cpu_idle_time_us(cpu, NULL) * NSEC_PER_USEC;
 }
 
 static u64 get_iowait_time(struct kernel_cpustat *kcs, int cpu)
 {
-	u64 iowait, iowait_usecs = -1ULL;
-
-	if (cpu_online(cpu))
-		iowait_usecs = get_cpu_iowait_time_us(cpu, NULL);
-
-	if (iowait_usecs == -1ULL)
-		/* !NO_HZ or cpu offline so we can rely on cpustat.iowait */
-		iowait = kcs->cpustat[CPUTIME_IOWAIT];
-	else
-		iowait = iowait_usecs * NSEC_PER_USEC;
-
-	return iowait;
+	return get_cpu_iowait_time_us(cpu, NULL) * NSEC_PER_USEC;
 }
 
 #endif
-- 
2.25.4


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] tick-sched: Do not clear the iowait and idle times
  2020-09-15 19:36 ` [PATCH v2 1/2] tick-sched: Do not clear the iowait and idle times Tom Hromatka
@ 2020-09-24 20:41   ` Thomas Gleixner
  0 siblings, 0 replies; 7+ messages in thread
From: Thomas Gleixner @ 2020-09-24 20:41 UTC (permalink / raw)
  To: Tom Hromatka, tom.hromatka, linux-kernel, linux-fsdevel,
	fweisbec, mingo, adobriyan

On Tue, Sep 15 2020 at 13:36, Tom Hromatka wrote:
> Prior to this commit, the cpu idle and iowait data in /proc/stat were
> cleared when a CPU goes down.  When the CPU came back online, both idle
> and iowait times were restarted from 0.

Starting a commit message with 'Prior to this commit' is
pointless. Describe the factual problem which made you come up with this
change.

>
> This commit preserves the CPU's idle and iowait values when a CPU goes
> offline and comes back online.

'This commit does' is just a variation of 'This patch does'.

git grep 'This patch' Documentation/process/

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/2] /proc/stat: Simplify iowait and idle calculations when cpu is offline
  2020-09-15 19:36 ` [PATCH v2 2/2] /proc/stat: Simplify iowait and idle calculations when cpu is offline Tom Hromatka
@ 2020-09-24 21:19   ` Thomas Gleixner
  0 siblings, 0 replies; 7+ messages in thread
From: Thomas Gleixner @ 2020-09-24 21:19 UTC (permalink / raw)
  To: Tom Hromatka, tom.hromatka, linux-kernel, linux-fsdevel,
	fweisbec, mingo, adobriyan

On Tue, Sep 15 2020 at 13:36, Tom Hromatka wrote:
> Prior to this commit, the cpu idle and iowait data in /proc/stat used
> different data sources based upon whether the CPU was online or not.
> This would cause spikes in the cpu idle and iowait data.

This would not cause spikes. It _causes_ these times to go backwards and
start over from 0. That's something completely different than a spike.

Please describe problems precisely.

> This patch uses the same data source, get_cpu_{idle,iowait}_time_us(),
> whether the CPU is online or not.
>
> This patch and the preceding patch, "tick-sched: Do not clear the
> iowait and idle times", ensure that the cpu idle and iowait data
> are always increasing.

So now you have a mixture of 'This commit and this patch'. Oh well.

Aside of that the ordering of your changelog is backwards. Something
like this:

   The CPU idle and iowait times in /proc/stats are inconsistent accross
   CPU hotplug.

   The reason is that for NOHZ active systems the core accounting of CPU
   idle and iowait times used to be reset when a CPU was unplugged. The
   /proc/stat code tries to work around that by using the corresponding
   member of kernel_cpustat when the CPU is offline.

   This works as long as the CPU stays offline, but when it is onlined
   again then the accounting is taken from the NOHZ core data again
   which started over from 0 causing both times to go backwards.

   The HOHZ core has been fixed to preserve idle and iowait times
   accross CPU unplug, so the broken workaround is not longer required.

Hmm?

But...

> --- a/fs/proc/stat.c
> +++ b/fs/proc/stat.c
> @@ -47,34 +47,12 @@ static u64 get_iowait_time(struct kernel_cpustat *kcs, int cpu)
>  
>  static u64 get_idle_time(struct kernel_cpustat *kcs, int cpu)
>  {
> -	u64 idle, idle_usecs = -1ULL;
> -
> -	if (cpu_online(cpu))
> -		idle_usecs = get_cpu_idle_time_us(cpu, NULL);
> -
> -	if (idle_usecs == -1ULL)
> -		/* !NO_HZ or cpu offline so we can rely on cpustat.idle */
> -		idle = kcs->cpustat[CPUTIME_IDLE];
> -	else
> -		idle = idle_usecs * NSEC_PER_USEC;
> -
> -	return idle;
> +	return get_cpu_idle_time_us(cpu, NULL) * NSEC_PER_USEC;

Q: How is this supposed to work on !NO_HZ systems or in case that NOHZ
   has been disabled at boot time via command line option or lack of
   hardware?

A: Not at all.

Hint #1: You removed the following comment:

	/* !NO_HZ or cpu offline so we can rely on cpustat.idle */

Hint #2: There is more than one valid kernel configuration.
'
Hint #3: Command line options and hardware features have side effects

Hint #4: git grep 'get_cpu_.*_time_us' 

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/2] /proc/stat: Simplify iowait and idle calculations when cpu is offline
  2020-09-28  2:59 Tom Hromatka
@ 2020-09-28 14:09 ` Thomas Gleixner
  0 siblings, 0 replies; 7+ messages in thread
From: Thomas Gleixner @ 2020-09-28 14:09 UTC (permalink / raw)
  To: Tom Hromatka; +Cc: mingo, fweisbec, adobriyan, linux-kernel, linux-fsdevel

Tom,

On Sun, Sep 27 2020 at 19:59, Tom Hromatka wrote:

> My sincere apologies.  2020 has been a challenging year for
> my family and me, and I readily admit that I have struggled
> with all of the added stress.  I realize and acknowledge that
> this is not an acceptable excuse for a patchset that doesn't
> hold up to the kernel or my standards.

don't worry. We all have periods where we are not up to the task. Take
your time!

Thanks,

        Thomas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/2] /proc/stat: Simplify iowait and idle calculations when cpu is offline
@ 2020-09-28  2:59 Tom Hromatka
  2020-09-28 14:09 ` Thomas Gleixner
  0 siblings, 1 reply; 7+ messages in thread
From: Tom Hromatka @ 2020-09-28  2:59 UTC (permalink / raw)
  To: tglx; +Cc: mingo, fweisbec, adobriyan, linux-kernel, linux-fsdevel

My sincere apologies.  2020 has been a challenging year for
my family and me, and I readily admit that I have struggled
with all of the added stress.  I realize and acknowledge that
this is not an acceptable excuse for a patchset that doesn't
hold up to the kernel or my standards.

Thank you for your time and your sincere response.

Best regards,

Tom


----- Original Message -----
From: tglx@linutronix.de
To: tom.hromatka@oracle.com, tom.hromatka@oracle.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, fweisbec@gmail.com, mingo@kernel.org, adobriyan@gmail.com
Sent: Thursday, September 24, 2020 3:19:58 PM GMT -07:00 US/Canada Mountain
Subject: Re: [PATCH v2 2/2] /proc/stat: Simplify iowait and idle calculations when cpu is offline

On Tue, Sep 15 2020 at 13:36, Tom Hromatka wrote:
> Prior to this commit, the cpu idle and iowait data in /proc/stat used
> different data sources based upon whether the CPU was online or not.
> This would cause spikes in the cpu idle and iowait data.

This would not cause spikes. It _causes_ these times to go backwards and
start over from 0. That's something completely different than a spike.

Please describe problems precisely.

> This patch uses the same data source, get_cpu_{idle,iowait}_time_us(),
> whether the CPU is online or not.
>
> This patch and the preceding patch, "tick-sched: Do not clear the
> iowait and idle times", ensure that the cpu idle and iowait data
> are always increasing.

So now you have a mixture of 'This commit and this patch'. Oh well.

Aside of that the ordering of your changelog is backwards. Something
like this:

   The CPU idle and iowait times in /proc/stats are inconsistent accross
   CPU hotplug.

   The reason is that for NOHZ active systems the core accounting of CPU
   idle and iowait times used to be reset when a CPU was unplugged. The
   /proc/stat code tries to work around that by using the corresponding
   member of kernel_cpustat when the CPU is offline.

   This works as long as the CPU stays offline, but when it is onlined
   again then the accounting is taken from the NOHZ core data again
   which started over from 0 causing both times to go backwards.

   The HOHZ core has been fixed to preserve idle and iowait times
   accross CPU unplug, so the broken workaround is not longer required.

Hmm?

But...

> --- a/fs/proc/stat.c
> +++ b/fs/proc/stat.c
> @@ -47,34 +47,12 @@ static u64 get_iowait_time(struct kernel_cpustat *kcs, int cpu)
>  
>  static u64 get_idle_time(struct kernel_cpustat *kcs, int cpu)
>  {
> -	u64 idle, idle_usecs = -1ULL;
> -
> -	if (cpu_online(cpu))
> -		idle_usecs = get_cpu_idle_time_us(cpu, NULL);
> -
> -	if (idle_usecs == -1ULL)
> -		/* !NO_HZ or cpu offline so we can rely on cpustat.idle */
> -		idle = kcs->cpustat[CPUTIME_IDLE];
> -	else
> -		idle = idle_usecs * NSEC_PER_USEC;
> -
> -	return idle;
> +	return get_cpu_idle_time_us(cpu, NULL) * NSEC_PER_USEC;

Q: How is this supposed to work on !NO_HZ systems or in case that NOHZ
   has been disabled at boot time via command line option or lack of
   hardware?

A: Not at all.

Hint #1: You removed the following comment:

	/* !NO_HZ or cpu offline so we can rely on cpustat.idle */

Hint #2: There is more than one valid kernel configuration.
'
Hint #3: Command line options and hardware features have side effects

Hint #4: git grep 'get_cpu_.*_time_us' 

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-09-28 14:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-15 19:36 [PATCH v2 0/2] iowait and idle fixes in /proc/stat Tom Hromatka
2020-09-15 19:36 ` [PATCH v2 1/2] tick-sched: Do not clear the iowait and idle times Tom Hromatka
2020-09-24 20:41   ` Thomas Gleixner
2020-09-15 19:36 ` [PATCH v2 2/2] /proc/stat: Simplify iowait and idle calculations when cpu is offline Tom Hromatka
2020-09-24 21:19   ` Thomas Gleixner
2020-09-28  2:59 Tom Hromatka
2020-09-28 14:09 ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).