LKML Archive on
help / color / mirror / Atom feed
From: Mike Kravetz <>
To: Andrew Morton <>
	David Hildenbrand <>,
	Michal Hocko <>,
	Oscar Salvador <>, Zi Yan <>,
	Muchun Song <>,
	Naoya Horiguchi <>,
	David Rientjes <>
Subject: Re: [PATCH RESEND 0/8] hugetlb: add demote/split page functionality
Date: Mon, 16 Aug 2021 17:17:50 -0700	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On 8/16/21 4:23 PM, Andrew Morton wrote:
> On Mon, 16 Aug 2021 15:49:45 -0700 Mike Kravetz <> wrote:
>> This is a resend of PATCHes sent here [4].  There was some discussion
>> and interest when the RFC [5] was sent, but little after that.  The
>> resend is just a rebase of [4] to next-20210816 with a few typos in
>> commmit messages fixed.
>> Original Cover Letter
>> ---------------------
>> The concurrent use of multiple hugetlb page sizes on a single system
>> is becoming more common.  One of the reasons is better TLB support for
>> gigantic page sizes on x86 hardware.  In addition, hugetlb pages are
>> being used to back VMs in hosting environments.
>> When using hugetlb pages to back VMs in such environments, it is
>> sometimes desirable to preallocate hugetlb pools.  This avoids the delay
>> and uncertainty of allocating hugetlb pages at VM startup.  In addition,
>> preallocating huge pages minimizes the issue of memory fragmentation that
>> increases the longer the system is up and running.
>> In such environments, a combination of larger and smaller hugetlb pages
>> are preallocated in anticipation of backing VMs of various sizes.  Over
>> time, the preallocated pool of smaller hugetlb pages may become
>> depleted while larger hugetlb pages still remain.  In such situations,
>> it may be desirable to convert larger hugetlb pages to smaller hugetlb
>> pages.
>> Converting larger to smaller hugetlb pages can be accomplished today by
>> first freeing the larger page to the buddy allocator and then allocating
>> the smaller pages.  However, there are two issues with this approach:
>> 1) This process can take quite some time, especially if allocation of
>>    the smaller pages is not immediate and requires migration/compaction.
>> 2) There is no guarantee that the total size of smaller pages allocated
>>    will match the size of the larger page which was freed.  This is
>>    because the area freed by the larger page could quickly be
>>    fragmented.
>> To address these issues, introduce the concept of hugetlb page demotion.
>> Demotion provides a means of 'in place' splitting a hugetlb page to
>> pages of a smaller size.  For example, on x86 one 1G page can be
>> demoted to 512 2M pages.  Page demotion is controlled via sysfs files.
>> - demote_size   Read only target page size for demotion
> Should this be "write only"?  If not, I'm confused.
> If "yes" then "write only" would be a misnomer - clearly this file is
> readable (looks at demote_size_show()).

It is read only and is there mostly as information for the user.  When
they demote a page, this is the size to which the page will be demoted.

For example,
# pwd
# cat demote_size
# pwd
# cat demote_size

The "demote size" is not user configurable.  Although, that is
something brought up by Oscar previously.  I did not directly address
this in the RFC.  My bad.  However, I do not like the idea of making
demote_size writable/selectable.  My concern would be someone changing
the value and not resetting.  It certainly is something that can be done
with minor code changes.

>> - demote        Writable number of hugetlb pages to be demoted
> So how does this interface work?  Write the target size to
> `demote_size', write the number of to-be-demoted larger pages to
> `demote' and then the operation happens?
> If so, how does one select which size pages should be selected for
> the demotion?

The location in the sysfs directory tells you what size pages will be
demoted.  For example,

echo 5 > /sys/kernel/mm/hugepages/hugepages-1048576kB/demote

says to demote 5 1GB pages.

demote files are also in node specific directories so you can even pick
huge pages from a specific node.

echo 5 >

> And how does one know the operation has completed so the sysfs files
> can be reloaded for another operation?

When the write to the file is complete, the operation has completed.
Not exactly sure what you mean by reloading the sysfs files for
another operation?

>> Only hugetlb pages which are free at the time of the request can be demoted.
>> Demotion does not add to the complexity surplus pages.  Demotion also honors
>> reserved huge pages.  Therefore, when a value is written to the sysfs demote
>> file, that value is only the maximum number of pages which will be demoted.
>> It is possible fewer will actually be demoted.
>> If demote_size is PAGESIZE, demote will simply free pages to the buddy
>> allocator.
>> Real world use cases
>> --------------------
>> There are groups today using hugetlb pages to back VMs on x86.  Their
>> use case is as described above.  They have experienced the issues with
>> performance and not necessarily getting the excepted number smaller huge
> ("expected")

yes, will fix typo

>> pages after free/allocate cycle.
> It seems odd to add the interfaces in patch 1 then document them in
> patch 5.  Why not add-and-document in a single patch?

Yes, makes sense.  Will combine these.
Mike Kravetz

  reply	other threads:[~2021-08-17  0:18 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-16 22:49 [PATCH RESEND 0/8] hugetlb: add demote/split page functionality Mike Kravetz
2021-08-16 22:49 ` [PATCH 1/8] hugetlb: add demote hugetlb page sysfs interfaces Mike Kravetz
2021-08-16 22:49 ` [PATCH 2/8] hugetlb: add HPageCma flag and code to free non-gigantic pages in CMA Mike Kravetz
2021-08-16 22:49 ` [PATCH 3/8] hugetlb: add demote bool to gigantic page routines Mike Kravetz
2021-08-16 22:49 ` [PATCH 4/8] hugetlb: add hugetlb demote page support Mike Kravetz
2021-08-16 22:49 ` [PATCH 5/8] hugetlb: document the demote sysfs interfaces Mike Kravetz
2021-08-16 23:28   ` Andrew Morton
2021-08-17  1:04     ` Mike Kravetz
2021-09-21 13:52   ` Aneesh Kumar K.V
2021-09-21 17:17     ` Mike Kravetz
2021-08-16 22:49 ` [PATCH 6/8] hugetlb: vmemmap optimizations when demoting hugetlb pages Mike Kravetz
2021-08-16 22:49 ` [PATCH 7/8] hugetlb: prepare destroy and prep routines for vmemmap optimized pages Mike Kravetz
2021-08-16 22:49 ` [PATCH 8/8] hugetlb: Optimized demote vmemmap optimizatized pages Mike Kravetz
2021-08-16 23:23 ` [PATCH RESEND 0/8] hugetlb: add demote/split page functionality Andrew Morton
2021-08-17  0:17   ` Mike Kravetz [this message]
2021-08-17  0:39     ` Andrew Morton
2021-08-17  0:58       ` Mike Kravetz
2021-08-16 23:27 ` Andrew Morton
2021-08-17  0:46   ` Mike Kravetz
2021-08-17  1:46     ` Andrew Morton
2021-08-17  7:30       ` David Hildenbrand
2021-08-17 16:19         ` Mike Kravetz
2021-08-17 18:49           ` David Hildenbrand
2021-08-24 22:08       ` Mike Kravetz
2021-08-27 17:22         ` Vlastimil Babka
2021-08-27 23:04           ` Mike Kravetz
2021-08-30 10:11             ` Vlastimil Babka
2021-09-02 18:17               ` Mike Kravetz
2021-09-06 14:40                 ` Vlastimil Babka
     [not found]                 ` <>
2021-09-08 21:00                   ` Mike Kravetz
2021-09-09 11:54                     ` Michal Hocko
2021-09-09 13:45                       ` Vlastimil Babka
2021-09-09 21:31                         ` Mike Kravetz
2021-09-10  8:20                         ` Michal Hocko
2021-09-11  0:11                           ` Mike Kravetz
2021-09-13 15:50                             ` Michal Hocko
2021-09-15 16:57                               ` Mike Kravetz
2021-09-17 20:44                                 ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).