* [RESEND PATCH 1/2] tick-sched: Do not clear the iowait and idle times
2020-09-09 14:41 [RESEND PATCH 0/2] iowait and idle fixes in /proc/stat Tom Hromatka
@ 2020-09-09 14:41 ` Tom Hromatka
2020-09-13 21:27 ` Thomas Gleixner
2020-09-09 14:41 ` [RESEND PATCH 2/2] /proc/stat: Simplify iowait and idle calculations when cpu is offline Tom Hromatka
2020-09-14 16:31 ` [RESEND PATCH 0/2] iowait and idle fixes in /proc/stat Tom Hromatka
2 siblings, 1 reply; 7+ messages in thread
From: Tom Hromatka @ 2020-09-09 14:41 UTC (permalink / raw)
To: tom.hromatka, linux-kernel, linux-fsdevel, fweisbec, tglx, mingo,
adobriyan
A customer reported that when a cpu goes offline and then comes back
online, the overall cpu idle and iowait data in /proc/stat decreases.
This is wreaking havoc with their cpu usage calculations.
Prior to this patch:
user nice system idle iowait
cpu 1390748 636 209444 9802206 19598
cpu1 178384 75 24545 1392450 3025
take cpu1 offline and bring it back online
user nice system idle iowait
cpu 1391209 636 209682 8453440 16595
cpu1 178440 75 24572 627 0
To prevent this, do not clear the idle and iowait times for the
cpu that has come back online.
With this patch:
user nice system idle iowait
cpu 129913 17 17590 166512 704
cpu1 15916 3 2395 20989 47
take cpu1 offline and bring it back online
user nice system idle iowait
cpu 130089 17 17686 184625 711
cpu1 15942 3 2401 23088 47
Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com>
---
kernel/time/tick-sched.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 3e2dc9b8858c..8103bad7bbd6 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1375,13 +1375,22 @@ void tick_setup_sched_timer(void)
void tick_cancel_sched_timer(int cpu)
{
struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ ktime_t idle_sleeptime, iowait_sleeptime;
# ifdef CONFIG_HIGH_RES_TIMERS
if (ts->sched_timer.base)
hrtimer_cancel(&ts->sched_timer);
# endif
+ /* save off and restore the idle_sleeptime and the iowait_sleeptime
+ * to avoid discontinuities and ensure that they are monotonically
+ * increasing
+ */
+ idle_sleeptime = ts->idle_sleeptime;
+ iowait_sleeptime = ts->iowait_sleeptime;
memset(ts, 0, sizeof(*ts));
+ ts->idle_sleeptime = idle_sleeptime;
+ ts->iowait_sleeptime = iowait_sleeptime;
}
#endif
--
2.25.4
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RESEND PATCH 1/2] tick-sched: Do not clear the iowait and idle times
2020-09-09 14:41 ` [RESEND PATCH 1/2] tick-sched: Do not clear the iowait and idle times Tom Hromatka
@ 2020-09-13 21:27 ` Thomas Gleixner
0 siblings, 0 replies; 7+ messages in thread
From: Thomas Gleixner @ 2020-09-13 21:27 UTC (permalink / raw)
To: Tom Hromatka, tom.hromatka, linux-kernel, linux-fsdevel,
fweisbec, mingo, adobriyan
Tom,
On Wed, Sep 09 2020 at 08:41, Tom Hromatka wrote:
> A customer reported that when a cpu goes offline and then comes back
> online, the overall cpu idle and iowait data in /proc/stat decreases.
> This is wreaking havoc with their cpu usage calculations.
for a changelog it's pretty irrelevant whether a customer reported
something or not.
Fact is that this happens and you fail to explain WHY it happens,
i.e. because the values are cleared when the CPU goes down and therefore
the accounting starts over from 0 when the CPU comes online again.
Describing this is much more useful than showing random numbers before
and after.
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -1375,13 +1375,22 @@ void tick_setup_sched_timer(void)
> void tick_cancel_sched_timer(int cpu)
> {
> struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
> + ktime_t idle_sleeptime, iowait_sleeptime;
>
> # ifdef CONFIG_HIGH_RES_TIMERS
> if (ts->sched_timer.base)
> hrtimer_cancel(&ts->sched_timer);
> # endif
>
> + /* save off and restore the idle_sleeptime and the iowait_sleeptime
> + * to avoid discontinuities and ensure that they are monotonically
> + * increasing
> + */
/*
* Please use sane multiline comment style and not the above
* abomination.
*/
Also please explain what this 'monotonically increasing' thing is
about. Without consulting the changelog it's hard to figure out what
that means.
Comments are valuable but only when they make actually sense on
their own. Something like the below perhaps?
/*
* Preserve idle and iowait sleep times accross a CPU offline/online
* sequence as they are accumulative.
*/
Hmm?
> + idle_sleeptime = ts->idle_sleeptime;
> + iowait_sleeptime = ts->iowait_sleeptime;
> memset(ts, 0, sizeof(*ts));
> + ts->idle_sleeptime = idle_sleeptime;
> + ts->iowait_sleeptime = iowait_sleeptime;
> }
Thanks,
tglx
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RESEND PATCH 2/2] /proc/stat: Simplify iowait and idle calculations when cpu is offline
2020-09-09 14:41 [RESEND PATCH 0/2] iowait and idle fixes in /proc/stat Tom Hromatka
2020-09-09 14:41 ` [RESEND PATCH 1/2] tick-sched: Do not clear the iowait and idle times Tom Hromatka
@ 2020-09-09 14:41 ` Tom Hromatka
2020-09-10 12:14 ` Alexey Dobriyan
2020-09-13 21:35 ` Thomas Gleixner
2020-09-14 16:31 ` [RESEND PATCH 0/2] iowait and idle fixes in /proc/stat Tom Hromatka
2 siblings, 2 replies; 7+ messages in thread
From: Tom Hromatka @ 2020-09-09 14:41 UTC (permalink / raw)
To: tom.hromatka, linux-kernel, linux-fsdevel, fweisbec, tglx, mingo,
adobriyan
A customer reported that when a cpu goes offline, the iowait and idle
times reported in /proc/stat will sometimes spike. This is being
caused by a different data source being used for these values when a
cpu is offline.
Prior to this patch:
put the system under heavy load so that there is little idle time
user nice system idle iowait
cpu 109515 17 32111 220686 607
take cpu1 offline
user nice system idle iowait
cpu 113742 17 32721 220724 612
bring cpu1 back online
user nice system idle iowait
cpu 118332 17 33430 220687 607
To prevent this, let's use the same data source whether a cpu is
online or not.
With this patch:
put the system under heavy load so that there is little idle time
user nice system idle iowait
cpu 14096 16 4646 157687 426
take cpu1 offline
user nice system idle iowait
cpu 21614 16 7179 157687 426
bring cpu1 back online
user nice system idle iowait
cpu 27362 16 9555 157688 426
Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com>
---
fs/proc/stat.c | 24 ++++++------------------
1 file changed, 6 insertions(+), 18 deletions(-)
diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 46b3293015fe..35b92539e711 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -47,32 +47,20 @@ static u64 get_iowait_time(struct kernel_cpustat *kcs, int cpu)
static u64 get_idle_time(struct kernel_cpustat *kcs, int cpu)
{
- u64 idle, idle_usecs = -1ULL;
+ u64 idle, idle_usecs;
- if (cpu_online(cpu))
- idle_usecs = get_cpu_idle_time_us(cpu, NULL);
-
- if (idle_usecs == -1ULL)
- /* !NO_HZ or cpu offline so we can rely on cpustat.idle */
- idle = kcs->cpustat[CPUTIME_IDLE];
- else
- idle = idle_usecs * NSEC_PER_USEC;
+ idle_usecs = get_cpu_idle_time_us(cpu, NULL);
+ idle = idle_usecs * NSEC_PER_USEC;
return idle;
}
static u64 get_iowait_time(struct kernel_cpustat *kcs, int cpu)
{
- u64 iowait, iowait_usecs = -1ULL;
-
- if (cpu_online(cpu))
- iowait_usecs = get_cpu_iowait_time_us(cpu, NULL);
+ u64 iowait, iowait_usecs;
- if (iowait_usecs == -1ULL)
- /* !NO_HZ or cpu offline so we can rely on cpustat.iowait */
- iowait = kcs->cpustat[CPUTIME_IOWAIT];
- else
- iowait = iowait_usecs * NSEC_PER_USEC;
+ iowait_usecs = get_cpu_iowait_time_us(cpu, NULL);
+ iowait = iowait_usecs * NSEC_PER_USEC;
return iowait;
}
--
2.25.4
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RESEND PATCH 2/2] /proc/stat: Simplify iowait and idle calculations when cpu is offline
2020-09-09 14:41 ` [RESEND PATCH 2/2] /proc/stat: Simplify iowait and idle calculations when cpu is offline Tom Hromatka
@ 2020-09-10 12:14 ` Alexey Dobriyan
2020-09-13 21:35 ` Thomas Gleixner
1 sibling, 0 replies; 7+ messages in thread
From: Alexey Dobriyan @ 2020-09-10 12:14 UTC (permalink / raw)
To: Tom Hromatka; +Cc: linux-kernel, linux-fsdevel, fweisbec, tglx, mingo
On Wed, Sep 09, 2020 at 08:41:22AM -0600, Tom Hromatka wrote:
> static u64 get_idle_time(struct kernel_cpustat *kcs, int cpu)
> {
> - u64 idle, idle_usecs = -1ULL;
> + u64 idle, idle_usecs;
>
> - if (cpu_online(cpu))
> - idle_usecs = get_cpu_idle_time_us(cpu, NULL);
> -
> - if (idle_usecs == -1ULL)
> - /* !NO_HZ or cpu offline so we can rely on cpustat.idle */
> - idle = kcs->cpustat[CPUTIME_IDLE];
> - else
> - idle = idle_usecs * NSEC_PER_USEC;
> + idle_usecs = get_cpu_idle_time_us(cpu, NULL);
> + idle = idle_usecs * NSEC_PER_USEC;
>
> return idle;
> }
>
> static u64 get_iowait_time(struct kernel_cpustat *kcs, int cpu)
> {
> - u64 iowait, iowait_usecs = -1ULL;
> -
> - if (cpu_online(cpu))
> - iowait_usecs = get_cpu_iowait_time_us(cpu, NULL);
> + u64 iowait, iowait_usecs;
>
> - if (iowait_usecs == -1ULL)
> - /* !NO_HZ or cpu offline so we can rely on cpustat.iowait */
> - iowait = kcs->cpustat[CPUTIME_IOWAIT];
> - else
> - iowait = iowait_usecs * NSEC_PER_USEC;
> + iowait_usecs = get_cpu_iowait_time_us(cpu, NULL);
> + iowait = iowait_usecs * NSEC_PER_USEC;
You can gc variables in both cases:
return get_cpu_iowait_time_us() * NSEC_PER_USEC;
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RESEND PATCH 2/2] /proc/stat: Simplify iowait and idle calculations when cpu is offline
2020-09-09 14:41 ` [RESEND PATCH 2/2] /proc/stat: Simplify iowait and idle calculations when cpu is offline Tom Hromatka
2020-09-10 12:14 ` Alexey Dobriyan
@ 2020-09-13 21:35 ` Thomas Gleixner
1 sibling, 0 replies; 7+ messages in thread
From: Thomas Gleixner @ 2020-09-13 21:35 UTC (permalink / raw)
To: Tom Hromatka, tom.hromatka, linux-kernel, linux-fsdevel,
fweisbec, mingo, adobriyan
On Wed, Sep 09 2020 at 08:41, Tom Hromatka wrote:
> A customer reported that when a cpu goes offline, the iowait and idle
> times reported in /proc/stat will sometimes spike. This is being
> caused by a different data source being used for these values when a
> cpu is offline.
>
> Prior to this patch:
>
> put the system under heavy load so that there is little idle time
>
> user nice system idle iowait
> cpu 109515 17 32111 220686 607
>
> take cpu1 offline
>
> user nice system idle iowait
> cpu 113742 17 32721 220724 612
>
> bring cpu1 back online
>
> user nice system idle iowait
> cpu 118332 17 33430 220687 607
>
> To prevent this, let's use the same data source whether a cpu is
> online or not.
Let's use? Your patch makes it use the same data source.
And again, neither the customer story nor the numbers are helpful to
understand the underlying problem. Also this lacks a reference to the
previous change which preserves the times accross a CPU offline/online
sequence.
> diff --git a/fs/proc/stat.c b/fs/proc/stat.c
> index 46b3293015fe..35b92539e711 100644
> --- a/fs/proc/stat.c
> +++ b/fs/proc/stat.c
> @@ -47,32 +47,20 @@ static u64 get_iowait_time(struct kernel_cpustat *kcs, int cpu)
>
> static u64 get_idle_time(struct kernel_cpustat *kcs, int cpu)
> {
> - u64 idle, idle_usecs = -1ULL;
> + u64 idle, idle_usecs;
>
> - if (cpu_online(cpu))
> - idle_usecs = get_cpu_idle_time_us(cpu, NULL);
> -
> - if (idle_usecs == -1ULL)
> - /* !NO_HZ or cpu offline so we can rely on cpustat.idle */
> - idle = kcs->cpustat[CPUTIME_IDLE];
> - else
> - idle = idle_usecs * NSEC_PER_USEC;
> + idle_usecs = get_cpu_idle_time_us(cpu, NULL);
> + idle = idle_usecs * NSEC_PER_USEC;
>
> return idle;
return get_cpu_idle_time_us(cpu, NULL) * NSEC_PER_USEC;
perhaps?
Thanks,
tglx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RESEND PATCH 0/2] iowait and idle fixes in /proc/stat
2020-09-09 14:41 [RESEND PATCH 0/2] iowait and idle fixes in /proc/stat Tom Hromatka
2020-09-09 14:41 ` [RESEND PATCH 1/2] tick-sched: Do not clear the iowait and idle times Tom Hromatka
2020-09-09 14:41 ` [RESEND PATCH 2/2] /proc/stat: Simplify iowait and idle calculations when cpu is offline Tom Hromatka
@ 2020-09-14 16:31 ` Tom Hromatka
2 siblings, 0 replies; 7+ messages in thread
From: Tom Hromatka @ 2020-09-14 16:31 UTC (permalink / raw)
To: linux-kernel, linux-fsdevel, fweisbec, tglx, mingo, adobriyan
Thanks for your time and feedback, Thomas and Alexey. I'll
address the comments and send out a v2 in the next couple
days.
Thanks!
Tom
On 9/9/20 8:41 AM, Tom Hromatka wrote:
> A customer is using /proc/stat to track cpu usage in a VM and noted
> that the iowait and idle times behave strangely when a cpu goes
> offline and comes back online.
>
> This patchset addresses two issues that can cause iowait and idle
> to fluctuate up and down. With these changes, cpu iowait and idle
> now only monotonically increase.
>
> Tom Hromatka (2):
> tick-sched: Do not clear the iowait and idle times
> /proc/stat: Simplify iowait and idle calculations when cpu is offline
>
> fs/proc/stat.c | 24 ++++++------------------
> kernel/time/tick-sched.c | 9 +++++++++
> 2 files changed, 15 insertions(+), 18 deletions(-)
>
^ permalink raw reply [flat|nested] 7+ messages in thread