LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Alexei Lozovsky <me@ilammy.net>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>,
	Christoph Lameter <cl@linux.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org
Subject: /proc/stat interrupt counter wrap-around
Date: Fri, 10 Sep 2021 17:53:28 +0900	[thread overview]
Message-ID: <06F4B1B0-E4DE-4380-A8E1-A5ACAD285163@ilammy.net> (raw)

A monitoring dashboard caught my attention when it displayed weird
spikes in computed interrupt rates. A machine with constant network
load shows about ~250k interrupts per second, then every 4-5 hours
there's a one-off spike to 11 billions.

Turns out, if you plot the interrupt counter, you get a graph like this:

                                     ###.                           
                                  ###   | ####.              #######
                  #####.     #####      ##    |     #########       
             #####     |   ##                 ######                
            #          ####                                         
        ####                                                        
       #                                                            
    ###                                                             

While monitoring tools are typically used to handling counter
wrap-arounds, they may not be ready to handle dips like this.


What is the impact
------------------

Not much, actually.

The counters always decrement by exactly 2^32 (which is suggestive),
so if you mask out the high bits of the counter and consider only
the low 32 bits, then the value sequence actually make sense,
given an appropriate sampling rate.

However, if you don't mask out the value and assume it to be accurate --
well, that assumption is incorrect. Interrupt sums might look correct
and contain some big number, but it could be arbitrarily distant from
the actual number of interrupts serviced since the boot time.

This concerns only the total value of "intr" and "softirq" rows:

    intr    14390913189 32 11 0 0 238 0 0 0 0 0 0 0 88 0 [...]
    softirq 14625063745 0 596000256 300149 272619841 0 0 [...]
            ^^^^^^^^^^^
            these ones


Why this happens
----------------

The reason for such behaviour is that the "total" interrupt counters
presented by /proc/stat are actually computed by adding up per-interrupt
per-CPU counters. Most of these are "unsigned int", while some of them
are "unsigned long", and the accumulator is "u64". What a mess...

Individual counters are monotonically increasing (modulo wrapping),
however if you add multiple values with different bit widths then
the sum is *not* guaranteed to be monotonically increasing.


What can be done
----------------

 1. Do nothing.

    Userspace can trivially compensate for this 'curious' behavior
    by masking out the high bits, observing only the low
    sizeof(unsigned) part, and taking care to handle wrap-arounds.

    This maintains status quo, but the "issue" of interrupt sums
    not being quite accurate remains.


 2. Change the presentation type to the lowest denominator.

    That is, unsigned int. Make the kernel mask out not-quite-accurate
    bits from the value it reports. Keep it that way until every
    underlying counter type is changed to something wider.

    The benefit here is that users that *are* ready to handle proper
    wrap-arounds will be able to handle them automagically without
    undocumented hacks (see option 1).

    This changes the observed value and will cause "unexpected"
    wrap-arounds to happen earlier in some use-cases, which might
    upset users that are not ready to handle them, or don't want
    to poll /proc/stat more frequently.

    It's debatable what's better: a lower-width value that might
    need to be polled more often, or a wider-width value that is
    not completely accurate.


 3. Change the interrupt counter types to be wider.

    A different take on the issue: instead of narrowing the presentation
    from faux-u64 to unsigned it, widen the interrupt counters from
    unsigned int to... something else:

    - u64             interrupt counters are 64-bit everywhere, period

    - unsigned long   interrupt counters are 64-bit if the platform
                      thinks that "long" is longer than "int"

    Whatever the type is used, it must be the same for all interrupt
    counters across the kernel as well as the type used to compute
    and display the sum of all these counters by /proc/stat.

    The advantage here is that 64-bit counters will be probably enough
    for *anything* to not overflow anytime soon before the heat death
    of the universe, thus making the wrap-around problem irrelevant.

    The disadvantage here is that some hardware counters are 32-bit,
    and you can't make them wider. Some platforms also don't have
    proper atomic support for 64-bit integers, making wider counters
    problematic to implement efficiently.


So what do we do?
-----------------

I suggest to wrap interrupt counter sum at "unsigned int", the same
type used for (most) individual counters. That makes for the most
predictable behavior.

I have a patch set cooking that does this.

Will this be of any interest? Or do you think changing the behavior
of /proc/stat will cause more trouble than merit?


Prior discussion
----------------

This question is by no means new, it has been discussed several times:

2019 - genirq, proc: Speedup /proc/stat interrupt statistics

    The issue of overflow and wrap-around has been touched upon,
    suggesting that userspace should just deal with it. The issue of
    using u64 for the sum has been brought up too, but it did not
    go anywhere.

https://lore.kernel.org/all/20190208143255.9dec696b15f03bf00f4c60c2@linux-foundation.org/
https://lore.kernel.org/all/3460540b50784dca813a57ddbbd41656@AcuMS.aculab.com/


2014 - Why do we still have 32 bit counters? Interrupt counters overflow within 50 days

    Discussion on whether it's appropriate to bump counter width to
    64 bits in order to avoid the overflow issues entirely.

https://lore.kernel.org/lkml/alpine.DEB.2.11.1410030435260.8324@gentwo.org/


             reply	other threads:[~2021-09-10  8:53 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-10  8:53 Alexei Lozovsky [this message]
2021-09-11  3:48 ` [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq" Alexei Lozovsky
2021-09-11  3:48   ` [PATCH 1/7] genirq: Use unsigned int for irqs_sum Alexei Lozovsky
2021-09-11  3:48   ` [PATCH 2/7] powerpc/irq: arch_irq_stat_cpu() returns unsigned int Alexei Lozovsky
2021-09-11  3:48   ` [PATCH 3/7] x86/irq: " Alexei Lozovsky
2021-09-11  3:48   ` [PATCH 4/7] x86/irq: arch_irq_stat() " Alexei Lozovsky
2021-09-11  3:48   ` [PATCH 5/7] proc/stat: Use unsigned int for "intr" sum Alexei Lozovsky
2021-09-11  3:48   ` [PATCH 6/7] proc/stat: Use unsigned int for "softirq" sum Alexei Lozovsky
2021-09-11  3:48   ` [PATCH 7/7] docs: proc.rst: stat: Note the interrupt counter wrap-around Alexei Lozovsky
2021-09-11  3:59     ` Randy Dunlap
2021-09-12  9:30   ` [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq" Alexey Dobriyan
2021-09-12 12:37     ` Alexei Lozovsky
2021-09-14 14:11       ` Thomas Gleixner
2021-09-15  4:24         ` Alexei Lozovsky
2021-09-15 17:58   ` [PATCH v2 00/12] " Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 01/12] genirq: Use READ_ONCE for IRQ counter reads Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 02/12] genirq: Use unsigned long for IRQ counters Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 03/12] powerpc/irq: Use READ_ONCE for IRQ counter reads Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 04/12] powerpc/irq: Use unsigned long for IRQ counters Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 05/12] powerpc/irq: Use unsigned long for IRQ counter sum Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 06/12] x86/irq: Use READ_ONCE for IRQ counter reads Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 07/12] x86/irq: Use unsigned long for IRQ counters Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 08/12] x86/irq: Use unsigned long for IRQ counters more Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 09/12] x86/irq: Use unsigned long for IRQ counter sum Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 10/12] proc/stat: Use unsigned long for "intr" sum Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 11/12] proc/stat: Use unsigned long for "softirq" sum Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 12/12] docs: proc.rst: stat: Note the interrupt counter wrap-around Alexei Lozovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=06F4B1B0-E4DE-4380-A8E1-A5ACAD285163@ilammy.net \
    --to=me@ilammy.net \
    --cc=adobriyan@gmail.com \
    --cc=cl@linux.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --subject='Re: /proc/stat interrupt counter wrap-around' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).