LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Cannon Matthews <cannonmatthews@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Andi Kleen <ak@linux.intel.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Greg Thelen <gthelen@google.com>, Salman Qazi <sqazi@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org
Subject: Re: [PATCH] mm: clear 1G pages with streaming stores on x86
Date: Mon, 16 Mar 2020 11:18:56 +0100	[thread overview]
Message-ID: <20200316101856.GH11482@dhcp22.suse.cz> (raw)
In-Reply-To: <20200311005447.jkpsaghrpk3c4rwu@box>

On Wed 11-03-20 03:54:47, Kirill A. Shutemov wrote:
> On Tue, Mar 10, 2020 at 05:21:30PM -0700, Cannon Matthews wrote:
> > On Mon, Mar 9, 2020 at 11:37 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Mon, Mar 09, 2020 at 08:38:31AM -0700, Andi Kleen wrote:
> > > > > Gigantic huge pages are a bit different. They are much less dynamic from
> > > > > the usage POV in my experience. Micro-optimizations for the first access
> > > > > tends to not matter at all as it is usually pre-allocation scenario. On
> > > > > the other hand, speeding up the initialization sounds like a good thing
> > > > > in general. It will be a single time benefit but if the additional code
> > > > > is not hard to maintain then I would be inclined to take it even with
> > > > > "artificial" numbers state above. There really shouldn't be other downsides
> > > > > except for the code maintenance, right?
> > > >
> > > > There's a cautious tale of the old crappy RAID5 XOR assembler functions which
> > > > were optimized a long time ago for the Pentium1, and stayed around,
> > > > even though the compiler could actually do a better job.
> > > >
> > > > String instructions are constantly improving in performance (Broadwell is
> > > > very old at this point) Most likely over time (and maybe even today
> > > > on newer CPUs) you would need much more sophisticated unrolled MOVNTI variants
> > > > (or maybe even AVX-*) to be competitive.
> > >
> > > Presumably you have access to current and maybe even some unreleased
> > > CPUs ... I mean, he's posted the patches, so you can test this hypothesis.
> > 
> > I don't have the data at hand, but could reproduce it if strongly
> > desired, but I've also tested this on skylake and  cascade lake, and
> > we've had success running with this for a while now.
> > 
> > When developing this originally, I tested all of this compared with
> > AVX-* instructions as well as the string ops, they all seemed to be
> > functionally equivalent, and all were beat out by this MOVNTI thing for
> > large regions of 1G pages.
> > 
> > There is probably room to further optimize the MOVNTI stuff with better
> > loop unrolling or optimizations, if anyone has specific suggestions I'm
> > happy to try to incorporate them, but this has shown to be effective as
> > written so far, and I think I lack that assembly expertise to micro
> > optimize further on my own.
> 
> Andi's point is that string instructions might be a better bet in a long
> run. You may win something with MOVNTI on current CPUs, but it may become
> a burden on newer microarchitectures when string instructions improves.
> Nobody realistically would re-validate if MOVNTI microoptimazation still
> make sense for every new microarchitecture.

While this might be true, isn't that easily solveable by the existing
ALTERNATIVE and cpu features framework. Can we have a feature bit to
tell that movnti is worthwile for large data copy routines. Probably
something for x86 maintainers.
-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2020-03-16 10:19 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-07  1:03 Cannon Matthews
2020-03-07 22:06 ` Andrew Morton
2020-03-09  0:08 ` Kirill A. Shutemov
2020-03-09  9:06   ` Michal Hocko
2020-03-09  9:35     ` Kirill A. Shutemov
2020-03-09 11:36     ` Kirill A. Shutemov
2020-03-09 12:26       ` Michal Hocko
2020-03-09 18:01         ` Mike Kravetz
2020-03-09 15:38     ` Andi Kleen
2020-03-09 18:37       ` Matthew Wilcox
2020-03-11  0:21         ` Cannon Matthews
2020-03-11  0:54           ` Kirill A. Shutemov
2020-03-11  3:35             ` Arvind Sankar
2020-03-11  8:16               ` Kirill A. Shutemov
2020-03-11 18:32                 ` Arvind Sankar
2020-03-11 20:32                   ` Arvind Sankar
2020-03-12  0:52                     ` Kirill A. Shutemov
2020-03-31  0:40                   ` Elliott, Robert (Servers)
2020-03-16 10:18             ` Michal Hocko [this message]
2020-03-16 12:19               ` Kirill A. Shutemov
2020-03-26 19:46                 ` Matthew Wilcox
2020-03-11 15:07       ` David Laight
2020-03-09 15:33   ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200316101856.GH11482@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cannonmatthews@google.com \
    --cc=gthelen@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=rientjes@google.com \
    --cc=sqazi@google.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --subject='Re: [PATCH] mm: clear 1G pages with streaming stores on x86' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).