LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: "Li, Aubrey" <aubrey.li@linux.intel.com>
Cc: mingo@redhat.com, peterz@infradead.org, hpa@zytor.com,
	ak@linux.intel.com, tim.c.chen@linux.intel.com,
	dave.hansen@intel.com, arjan@linux.intel.com,
	aubrey.li@intel.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v11 2/3] x86,/proc/pid/status: Add AVX-512 usage elapsed time
Date: Sat, 16 Feb 2019 13:55:41 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.21.1902161340320.1683@nanos.tec.linutronix.de> (raw)
In-Reply-To: <85b22e16-fffc-7ebe-2ab1-3b6fe7e036ab@linux.intel.com>

On Fri, 15 Feb 2019, Li, Aubrey wrote:
> On 2019/2/14 19:29, Thomas Gleixner wrote:
> Under this scenario, the elapsed time becomes longer than normal indeed, see below:
> 
> $ while [ 1 ]; do cat /proc/6985/status | grep AVX; sleep 1; done
> AVX512_elapsed_ms:	3432
> AVX512_elapsed_ms:	440
> AVX512_elapsed_ms:	1444
> AVX512_elapsed_ms:	2448
> AVX512_elapsed_ms:	3456
> AVX512_elapsed_ms:	460
> AVX512_elapsed_ms:	1464
> AVX512_elapsed_ms:	2468
> 
> But AFAIK, google's Heracles do a 15s polling, so this worst case is still acceptable.?

I have no idea what Google's thingy does and you surely have to ask those
people who want to use this whether they are OK with that. I personally
think the numbers are largely useless, but I don't know the use case.

> >IOW, this needs crystal ball magic to decode because
> > there is no correlation between that elapsed time and the time when the
> > last context switch happened simply because that time is not available in
> > /proc/$PID/status. Sure you can oracle it out from /proc/$PID/stat with
> > even more crystal ball magic, but there is no explanation at all.
> > 
> > There may be use case scenarios where this crystal ball prediction is
> > actually useful, but the inaccuracy of that information and the possible
> > pitfalls for any user space application which uses it need to be documented
> > in detail. Without that, this is going to cause more trouble and confusion
> > than benefit.
> > 
> Not sure if the above experiment addressed your concern, please correct me if
> I totally misunderstood.

The above experiment just confirms what I said: The numbers are inaccurate
and potentially misleading to a large extent when the AVX using task is not
scheduled out for a longer time.

So what I'm asking for is proper documentation which explains how this
'hint' is generated in the kernel and why it can be completely inaccurate
and misleading. We don't want to end up in a situation where people start
to rely on this information and then have to go and read kernel code to
understand why the numbers do not make sense.

I'm not convinced that this interface in the current form is actually
useful. Even if you ignore the single task example, then on a loaded
machine where tasks are scheduled in and out by time slices, then the
calculation is:

	    delta = (long)(jiffies - timestamp);

delta is what you expose as elapsed_ms. Now assume that the task is seen as
using AVX when being scheduled out. So depending on the time it is
scheduled out, whether it's due lots of other tasks occupying the CPU or
due to a blocking syscall, the result can be completely misleading. The job
scheduler will see for example: 80ms ago was last AVX usage recorded and
decide that this is just an occasional usage and migrate it away. Now the
task gets on the other CPU and starts using AVX again, which makes the
scheduler see a short delta and decide to move it back.

So interpreting the value is voodoo which becomes even harder when there is
no documentation.

Thanks,

	tglx

  reply	other threads:[~2019-02-16 13:04 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-13  2:37 [PATCH v11 1/3] /proc/pid/status: Add support for architecture specific output Aubrey Li
2019-02-13  2:37 ` [PATCH v11 2/3] x86,/proc/pid/status: Add AVX-512 usage elapsed time Aubrey Li
2019-02-14 11:29   ` Thomas Gleixner
2019-02-15  4:35     ` Li, Aubrey
2019-02-16 12:55       ` Thomas Gleixner [this message]
2019-02-16 17:05         ` Li, Aubrey
2019-02-20 15:35         ` David Laight
2019-02-20 15:38           ` Arjan van de Ven
2019-02-13  2:37 ` [PATCH v11 3/3] Documentation/filesystems/proc.txt: add AVX512_elapsed_ms Aubrey Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1902161340320.1683@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=ak@linux.intel.com \
    --cc=arjan@linux.intel.com \
    --cc=aubrey.li@intel.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=dave.hansen@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tim.c.chen@linux.intel.com \
    --subject='Re: [PATCH v11 2/3] x86,/proc/pid/status: Add AVX-512 usage elapsed time' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).