LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Cannon Matthews <cannonmatthews@google.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	David Rientjes <rientjes@google.com>,
	Greg Thelen <gthelen@google.com>, Salman Qazi <sqazi@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	ak@linux.intel.com, x86@kernel.org
Subject: Re: [PATCH] mm: clear 1G pages with streaming stores on x86
Date: Mon, 9 Mar 2020 13:26:46 +0100	[thread overview]
Message-ID: <20200309122646.GM8447@dhcp22.suse.cz> (raw)
In-Reply-To: <20200309113658.bctbw35e73ahhgbu@box>

On Mon 09-03-20 14:36:58, Kirill A. Shutemov wrote:
> On Mon, Mar 09, 2020 at 10:06:30AM +0100, Michal Hocko wrote:
> > On Mon 09-03-20 03:08:20, Kirill A. Shutemov wrote:
> > > On Fri, Mar 06, 2020 at 05:03:53PM -0800, Cannon Matthews wrote:
> > > > Reimplement clear_gigantic_page() to clear gigabytes pages using the
> > > > non-temporal streaming store instructions that bypass the cache
> > > > (movnti), since an entire 1GiB region will not fit in the cache anyway.
> > > > 
> > > > Doing an mlock() on a 512GiB 1G-hugetlb region previously would take on
> > > > average 134 seconds, about 260ms/GiB which is quite slow. Using `movnti`
> > > > and optimizing the control flow over the constituent small pages, this
> > > > can be improved roughly by a factor of 3-4x, with the 512GiB mlock()
> > > > taking only 34 seconds on average, or 67ms/GiB.
> > > > 
> > > > The assembly code for the __clear_page_nt routine is more or less
> > > > taken directly from the output of gcc with -O3 for this function with
> > > > some tweaks to support arbitrary sizes and moving memory barriers:
> > > > 
> > > > void clear_page_nt_64i (void *page)
> > > > {
> > > >   for (int i = 0; i < GiB /sizeof(long long int); ++i)
> > > >     {
> > > >       _mm_stream_si64 (((long long int*)page) + i, 0);
> > > >     }
> > > >   sfence();
> > > > }
> > > > 
> > > > Tested:
> > > > 	Time to `mlock()` a 512GiB region on broadwell CPU
> > > > 				AVG time (s)	% imp.	ms/page
> > > > 	clear_page_erms		133.584		-	261
> > > > 	clear_page_nt		34.154		74.43%	67
> > > 
> > > Some macrobenchmark would be great too.
> > > 
> > > > An earlier version of this code was sent as an RFC patch ~July 2018
> > > > https://patchwork.kernel.org/patch/10543193/ but never merged.
> > > 
> > > Andi and I tried to use MOVNTI for large/gigantic page clearing back in
> > > 2012[1]. Maybe it can be useful.
> > > 
> > > That patchset is somewhat more complex trying to keep the memory around
> > > the fault address hot in cache. In theory it should help to reduce latency
> > > on the first access to the memory.
> > > 
> > > I was not able to get convincing numbers back then for the hardware of the
> > > time. Maybe it's better now.
> > > 
> > > https://lore.kernel.org/r/1345470757-12005-1-git-send-email-kirill.shutemov@linux.intel.com
> > 
> > Thanks for the reminder. I've had only a very vague recollection. Your
> > series had a much wider scope indeed. Since then we have gained
> > process_huge_page which tries to optimize normal huge pages.
> > 
> > Gigantic huge pages are a bit different. They are much less dynamic from
> > the usage POV in my experience. Micro-optimizations for the first access
> > tends to not matter at all as it is usually pre-allocation scenario.
> 
> The page got cleared not on reservation, but on allocation, including page
> fault time. Keeping the page around the fault address can still be
> beneficial.

You are right of course. What I meant to say that GB pages backed
workloads I have seen tend to pre-allocate during the startup so they do
not realy on lazy initialization duing #PF. This is slightly easier to
handle for resource that is essentially impossible to get on-demand so
an early failure is easier to handle.

If there are workloads which can benefit from page fault
microptimizations then all good but this can be done on top and
demonstrate by numbers. It is much more easier to demonstrate the speed
up on pre-initialization workloads. That's all I wanted to say here.
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2020-03-09 12:26 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-07  1:03 Cannon Matthews
2020-03-07 22:06 ` Andrew Morton
2020-03-09  0:08 ` Kirill A. Shutemov
2020-03-09  9:06   ` Michal Hocko
2020-03-09  9:35     ` Kirill A. Shutemov
2020-03-09 11:36     ` Kirill A. Shutemov
2020-03-09 12:26       ` Michal Hocko [this message]
2020-03-09 18:01         ` Mike Kravetz
2020-03-09 15:38     ` Andi Kleen
2020-03-09 18:37       ` Matthew Wilcox
2020-03-11  0:21         ` Cannon Matthews
2020-03-11  0:54           ` Kirill A. Shutemov
2020-03-11  3:35             ` Arvind Sankar
2020-03-11  8:16               ` Kirill A. Shutemov
2020-03-11 18:32                 ` Arvind Sankar
2020-03-11 20:32                   ` Arvind Sankar
2020-03-12  0:52                     ` Kirill A. Shutemov
2020-03-31  0:40                   ` Elliott, Robert (Servers)
2020-03-16 10:18             ` Michal Hocko
2020-03-16 12:19               ` Kirill A. Shutemov
2020-03-26 19:46                 ` Matthew Wilcox
2020-03-11 15:07       ` David Laight
2020-03-09 15:33   ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200309122646.GM8447@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cannonmatthews@google.com \
    --cc=gthelen@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=rientjes@google.com \
    --cc=sqazi@google.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --subject='Re: [PATCH] mm: clear 1G pages with streaming stores on x86' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).