LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Alexey Dobriyan <adobriyan@gmail.com>
Cc: linux-kernel@vger.kernel.org, rdunlap@infradead.org,
	akpm@linux-foundation.org
Subject: Re: [PATCH] proc/stat: Separate out individual irq counts into /proc/stat_irqs
Date: Thu, 19 Apr 2018 16:58:53 -0400	[thread overview]
Message-ID: <c0530769-50c3-87ad-1558-164873482130@redhat.com> (raw)
In-Reply-To: <20180419203949.GA4555@avx2>

On 04/19/2018 04:39 PM, Alexey Dobriyan wrote:
> On Thu, Apr 19, 2018 at 04:21:14PM -0400, Waiman Long wrote:
>> On 04/19/2018 03:55 PM, Alexey Dobriyan wrote:
>>> On Thu, Apr 19, 2018 at 03:28:40PM -0400, Waiman Long wrote:
>>>> On 04/19/2018 03:08 PM, Alexey Dobriyan wrote:
>>>>>> Therefore, application performance can be impacted if the application
>>>>>> reads /proc/stat rather frequently.
>>>>> [nods]
>>>>> Text interfaces can be designed in a very stupid way.
>>>>>
>>>>>> For example, reading /proc/stat in a certain 2-socket Skylake server
>>>>>> took about 4.6ms because it had over 5k irqs.
>>>>> Is this top(1)? What is this application doing?
>>>>> If it needs percpu usage stats, then maybe /proc/stat should be
>>>>> converted away from single_open() so that core seq_file code doesn't
>>>>> generate everything at once.
>>>> The application is actually a database benchmarking tool used by a
>>>> customer.
>>> So it probably needs lines before "intr" line.
>>>
>>>> The reading of /proc/stat is an artifact of the benchmarking
>>>> tool that can actually be turned off. Without doing that, about 20% of
>>>> CPU time were spent reading /proc/stat and the trashing of cachelines
>>>> slowed the benchmark number quite significantly. However, I was also
>>>> told that there are legitimate cases where reading /proc/stat was
>>>> necessary in some of their applications.
>>>>
>>>>>> -
>>>>>> -	/* sum again ? it could be updated? */
>>>>>> -	for_each_irq_nr(j)
>>>>>> -		seq_put_decimal_ull(p, " ", kstat_irqs_usr(j));
>>>>>> -
>>>>> This is direct userspace breakage.
>>>> Yes, I am aware of that. That is the cost of improving the performance
>>>> of applications that read /proc/stat, but don't need the individual irq
>>>> counts.
>>> Yeah, but all it takes is one script which cares.
>>>
>>> I have an idea.
>>>
>>> Maintain "maximum registered irq #", it should be much smaller than
>>> "nr_irqs":
>>>
>>> intr 4245359 151 0 0 0 0 0 0 0 38 0 0 0 0 0 0 0 0 0 64 0 0 0 0 0 0 0 0 0 0 44330 182364 57741 0 0 0 0 0 0 0 0 85 89124 0 0 0 0 0 323360 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> Yes, that can probably help.
>>
>> This is the data from the problematic skylake server:
>>
>> model name    : Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz
>> 56 sosreport-carevalo.02076935-20180413085327/proc/stat
>> Interrupts: 5370
>> Interrupts without "0" entries: 1011
>>
>> There are still quite a large number of non-zero entries, though.
>>
>>> Or maintain array of registered irqs and iterate over them only.
>> Right, we can allocate a bitmap of used irqs to do that.
>>
>>> I have another idea.
>>>
>>> perf record shows mutex_lock/mutex_unlock at the top.
>>> Most of them are irq mutex not seqfile mutex as there are many more
>>> interrupts than reads. Take it once.
>>>
>> How many cpus are in your test system? In that skylake server, it was
>> the per-cpu summing operation of the irq counts that was consuming most
>> of the time for reading /proc/stat. I think we can certainly try to
>> optimize the lock taking.
> It's 16x(NR_IRQS: 4352, nr_irqs: 960, preallocated irqs: 16)
> Given that irq registering is rare operation, maintaining sorted array
> of irq should be the best option.
>> For the time being, I think I am going to have a clone /proc/stat2 as
>> suggested in my earlier email. Alternatively, I can put that somewhere
>> in sysfs if you have a good idea of where I can put it.
> sysfs is strictly one-value-per-file.
>
>> I will also look into ways to optimize the current per-IRQ stats
>> handling, but it will come later.
> There is always a time-honored way of ioctl(2) switching irq info off
> /proc supports that.
>
> There are many options.

OK, it is good to know. Do you have any existing code snippet in the
kernel that I can use as reference on how to use ioctl(2) switching?

I will look into how to optimize the existing per-IRQ stats code first
before venturing into cloning /proc/stat.

Cheers,
Longman

  reply	other threads:[~2018-04-19 20:58 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-19 19:08 Alexey Dobriyan
2018-04-19 19:28 ` Waiman Long
2018-04-19 19:55   ` Alexey Dobriyan
     [not found]     ` <eb7c1569-e445-5cbb-6d10-2694b625232a@redhat.com>
2018-04-19 20:39       ` Alexey Dobriyan
2018-04-19 20:58         ` Waiman Long [this message]
2018-04-19 23:23           ` Joel Fernandes (Google)
2018-04-21 20:34             ` Alexey Dobriyan
2018-04-21 20:36               ` Alexey Dobriyan
2018-04-24  5:54                 ` David Rientjes
2018-04-24  6:18                   ` Alexey Dobriyan
2018-05-02  0:02                     ` Andrew Morton
2018-04-19 21:05         ` Waiman Long
  -- strict thread matches above, loose matches on Subject: below --
2018-04-19 17:09 Waiman Long
2018-04-19 17:38 ` Randy Dunlap
2018-04-19 18:44   ` Waiman Long
2018-04-19 19:43 ` Andrew Morton
2018-04-19 19:57   ` Waiman Long
2018-04-19 20:02   ` Alexey Dobriyan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c0530769-50c3-87ad-1558-164873482130@redhat.com \
    --to=longman@redhat.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rdunlap@infradead.org \
    --subject='Re: [PATCH] proc/stat: Separate out individual irq counts into /proc/stat_irqs' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).