LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	David Hildenbrand <david@redhat.com>,
	Michal Hocko <mhocko@suse.com>,
	Oscar Salvador <osalvador@suse.de>, Zi Yan <ziy@nvidia.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	David Rientjes <rientjes@google.com>
Subject: Re: [PATCH RESEND 0/8] hugetlb: add demote/split page functionality
Date: Mon, 16 Aug 2021 17:17:50 -0700	[thread overview]
Message-ID: <5dd4e07b-d2cf-63f2-fc0a-9b371b469a44@oracle.com> (raw)
In-Reply-To: <20210816162305.b19bfa3f3ba7431a62ff205f@linux-foundation.org>

On 8/16/21 4:23 PM, Andrew Morton wrote:
> On Mon, 16 Aug 2021 15:49:45 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:
> 
>> This is a resend of PATCHes sent here [4].  There was some discussion
>> and interest when the RFC [5] was sent, but little after that.  The
>> resend is just a rebase of [4] to next-20210816 with a few typos in
>> commmit messages fixed.
>>
>> Original Cover Letter
>> ---------------------
>> The concurrent use of multiple hugetlb page sizes on a single system
>> is becoming more common.  One of the reasons is better TLB support for
>> gigantic page sizes on x86 hardware.  In addition, hugetlb pages are
>> being used to back VMs in hosting environments.
>>
>> When using hugetlb pages to back VMs in such environments, it is
>> sometimes desirable to preallocate hugetlb pools.  This avoids the delay
>> and uncertainty of allocating hugetlb pages at VM startup.  In addition,
>> preallocating huge pages minimizes the issue of memory fragmentation that
>> increases the longer the system is up and running.
>>
>> In such environments, a combination of larger and smaller hugetlb pages
>> are preallocated in anticipation of backing VMs of various sizes.  Over
>> time, the preallocated pool of smaller hugetlb pages may become
>> depleted while larger hugetlb pages still remain.  In such situations,
>> it may be desirable to convert larger hugetlb pages to smaller hugetlb
>> pages.
>>
>> Converting larger to smaller hugetlb pages can be accomplished today by
>> first freeing the larger page to the buddy allocator and then allocating
>> the smaller pages.  However, there are two issues with this approach:
>> 1) This process can take quite some time, especially if allocation of
>>    the smaller pages is not immediate and requires migration/compaction.
>> 2) There is no guarantee that the total size of smaller pages allocated
>>    will match the size of the larger page which was freed.  This is
>>    because the area freed by the larger page could quickly be
>>    fragmented.
>>
>> To address these issues, introduce the concept of hugetlb page demotion.
>> Demotion provides a means of 'in place' splitting a hugetlb page to
>> pages of a smaller size.  For example, on x86 one 1G page can be
>> demoted to 512 2M pages.  Page demotion is controlled via sysfs files.
>> - demote_size   Read only target page size for demotion
> 
> Should this be "write only"?  If not, I'm confused.
> 
> If "yes" then "write only" would be a misnomer - clearly this file is
> readable (looks at demote_size_show()).
> 

It is read only and is there mostly as information for the user.  When
they demote a page, this is the size to which the page will be demoted.

For example,
# pwd
/sys/kernel/mm/hugepages/hugepages-1048576kB
# cat demote_size
2048kB
# pwd
/sys/kernel/mm/hugepages/hugepages-2048kB
# cat demote_size
4kB

The "demote size" is not user configurable.  Although, that is
something brought up by Oscar previously.  I did not directly address
this in the RFC.  My bad.  However, I do not like the idea of making
demote_size writable/selectable.  My concern would be someone changing
the value and not resetting.  It certainly is something that can be done
with minor code changes.

>> - demote        Writable number of hugetlb pages to be demoted
> 
> So how does this interface work?  Write the target size to
> `demote_size', write the number of to-be-demoted larger pages to
> `demote' and then the operation happens?
> 
> If so, how does one select which size pages should be selected for
> the demotion?

The location in the sysfs directory tells you what size pages will be
demoted.  For example,

echo 5 > /sys/kernel/mm/hugepages/hugepages-1048576kB/demote

says to demote 5 1GB pages.

demote files are also in node specific directories so you can even pick
huge pages from a specific node.

echo 5 >
/sys/devices/system/node/node1/hugepages/hugepages-1048576kB/demote

> 
> And how does one know the operation has completed so the sysfs files
> can be reloaded for another operation?
> 

When the write to the file is complete, the operation has completed.
Not exactly sure what you mean by reloading the sysfs files for
another operation?

>> Only hugetlb pages which are free at the time of the request can be demoted.
>> Demotion does not add to the complexity surplus pages.  Demotion also honors
>> reserved huge pages.  Therefore, when a value is written to the sysfs demote
>> file, that value is only the maximum number of pages which will be demoted.
>> It is possible fewer will actually be demoted.
>>
>> If demote_size is PAGESIZE, demote will simply free pages to the buddy
>> allocator.
>>
>> Real world use cases
>> --------------------
>> There are groups today using hugetlb pages to back VMs on x86.  Their
>> use case is as described above.  They have experienced the issues with
>> performance and not necessarily getting the excepted number smaller huge
> 
> ("expected")

yes, will fix typo

> 
>> pages after free/allocate cycle.
>>
> 
> It seems odd to add the interfaces in patch 1 then document them in
> patch 5.  Why not add-and-document in a single patch?
> 

Yes, makes sense.  Will combine these.
-- 
Mike Kravetz

  reply	other threads:[~2021-08-17  0:18 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-16 22:49 Mike Kravetz
2021-08-16 22:49 ` [PATCH 1/8] hugetlb: add demote hugetlb page sysfs interfaces Mike Kravetz
2021-08-16 22:49 ` [PATCH 2/8] hugetlb: add HPageCma flag and code to free non-gigantic pages in CMA Mike Kravetz
2021-08-16 22:49 ` [PATCH 3/8] hugetlb: add demote bool to gigantic page routines Mike Kravetz
2021-08-16 22:49 ` [PATCH 4/8] hugetlb: add hugetlb demote page support Mike Kravetz
2021-08-16 22:49 ` [PATCH 5/8] hugetlb: document the demote sysfs interfaces Mike Kravetz
2021-08-16 23:28   ` Andrew Morton
2021-08-17  1:04     ` Mike Kravetz
2021-09-21 13:52   ` Aneesh Kumar K.V
2021-09-21 17:17     ` Mike Kravetz
2021-08-16 22:49 ` [PATCH 6/8] hugetlb: vmemmap optimizations when demoting hugetlb pages Mike Kravetz
2021-08-16 22:49 ` [PATCH 7/8] hugetlb: prepare destroy and prep routines for vmemmap optimized pages Mike Kravetz
2021-08-16 22:49 ` [PATCH 8/8] hugetlb: Optimized demote vmemmap optimizatized pages Mike Kravetz
2021-08-16 23:23 ` [PATCH RESEND 0/8] hugetlb: add demote/split page functionality Andrew Morton
2021-08-17  0:17   ` Mike Kravetz [this message]
2021-08-17  0:39     ` Andrew Morton
2021-08-17  0:58       ` Mike Kravetz
2021-08-16 23:27 ` Andrew Morton
2021-08-17  0:46   ` Mike Kravetz
2021-08-17  1:46     ` Andrew Morton
2021-08-17  7:30       ` David Hildenbrand
2021-08-17 16:19         ` Mike Kravetz
2021-08-17 18:49           ` David Hildenbrand
2021-08-24 22:08       ` Mike Kravetz
2021-08-27 17:22         ` Vlastimil Babka
2021-08-27 23:04           ` Mike Kravetz
2021-08-30 10:11             ` Vlastimil Babka
2021-09-02 18:17               ` Mike Kravetz
2021-09-06 14:40                 ` Vlastimil Babka
     [not found]                 ` <20210907085001.3773-1-hdanton@sina.com>
2021-09-08 21:00                   ` Mike Kravetz
2021-09-09 11:54                     ` Michal Hocko
2021-09-09 13:45                       ` Vlastimil Babka
2021-09-09 21:31                         ` Mike Kravetz
2021-09-10  8:20                         ` Michal Hocko
2021-09-11  0:11                           ` Mike Kravetz
2021-09-13 15:50                             ` Michal Hocko
2021-09-15 16:57                               ` Mike Kravetz
2021-09-17 20:44                                 ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5dd4e07b-d2cf-63f2-fc0a-9b371b469a44@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=osalvador@suse.de \
    --cc=rientjes@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=ziy@nvidia.com \
    --subject='Re: [PATCH RESEND 0/8] hugetlb: add demote/split page functionality' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).