LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Christoph Lameter <clameter@sgi.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
Date: Fri, 26 Jan 2007 16:48:04 +0000 (GMT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0701261629050.23091@skynet.skynet.ie> (raw)
In-Reply-To: <Pine.LNX.4.64.0701260812150.6141@schroedinger.engr.sgi.com>

On Fri, 26 Jan 2007, Christoph Lameter wrote:

> On Thu, 25 Jan 2007, Mel Gorman wrote:
>
>> The following 8 patches against 2.6.20-rc4-mm1 create a zone called
>> ZONE_MOVABLE that is only usable by allocations that specify both __GFP_HIGHMEM
>> and __GFP_MOVABLE. This has the effect of keeping all non-movable pages
>> within a single memory partition while allowing movable allocations to be
>> satisified from either partition.
>
> For arches that do not have HIGHMEM other zones would be okay too it
> seems.
>

It would, but it'd obscure the code to take advantage of that.

>> The size of the zone is determined by a kernelcore= parameter specified at
>> boot-time. This specifies how much memory is usable by non-movable allocations
>> and the remainder is used for ZONE_MOVABLE. Any range of pages within
>> ZONE_MOVABLE can be released by migrating the pages or by reclaiming.
>
> The user has to manually fiddle around with the size of the unmovable
> partition until it works?
>

They have to fiddle with the size of the unmovable partition if their 
workload uses more unmovable kernel allocations than expected. This was 
always going to be the restriction with using zones for partitioning 
memory. Resizing zones on the fly is not really an option because the 
resizing would only work reliably in one direction.

The anti-fragmentation code could potentially be used to have subzone 
groups that kept movable and unmovable allocations as far apart as 
possible and at opposite ends of a zone. That approach has been kicked a 
few times because of complexity.

>> When selecting a zone to take pages from for ZONE_MOVABLE, there are two
>> things to consider. First, only memory from the highest populated zone is
>> used for ZONE_MOVABLE. On the x86, this is probably going to be ZONE_HIGHMEM
>> but it would be ZONE_DMA on ppc64 or possibly ZONE_DMA32 on x86_64. Second,
>> the amount of memory usable by the kernel will be spreadly evenly throughout
>> NUMA nodes where possible. If the nodes are not of equal size, the amount
>> of memory usable by the kernel on some nodes may be greater than others.
>
> So how is the amount of movable memory on a node calculated?

Subtle difference. The amount of unmovable memory is calculated per node.

> Evenly
> distributed?

As evenly as possible.

> There are some NUMA architectures that are not that
> symmetric.
>

I know, it's why find_zone_movable_pfns_for_nodes() is as complex as it 
is. The mechanism spreads the unmovable memory evenly throughout all 
nodes. In the event some nodes are too small to hold their share, the 
remaining unmovable memory is divided between the nodes that are larger.

>> By default, the zone is not as useful for hugetlb allocations because they
>> are pinned and non-migratable (currently at least). A sysctl is provided that
>> allows huge pages to be allocated from that zone. This means that the huge
>> page pool can be resized to the size of ZONE_MOVABLE during the lifetime of
>> the system assuming that pages are not mlocked. Despite huge pages being
>> non-movable, we do not introduce additional external fragmentation of note
>> as huge pages are always the largest contiguous block we care about.
>
> The user already has to specify the partitioning of the system at bootup
> and could take the huge page sizes into account.
>

Not in all cases. Some systems will not know how many huge pages they need 
in advance because it is used as a batch system running jobs as requested. 
The zone allows an amount of memory to be set aside that can be 
*optionally* used for hugepages if desired or base pages if not. Between 
jobs, the hugepage pool can be resized up to the size of ZONE_MOVABLE.

The other case is ever supporting memory hot-remove. Any memory within 
ZONE_MOVABLE can potentially be removed by migrating pages and off-lined.

> Also huge pages may have variable sizes that can be specified on bootup
> for IA64. The assumption that a huge page is always the largest
> contiguous block is *not true*.
>

I didn't say they were the largest supported contiguous block, I said they 
were the largest contiguous block we *care* about. Right now, it is 
assumed that variable pages are not supported at runtime. If they were, 
some smarts would be needed to keep huge pages of the same size together 
to control external fragmentation but that's about it.

> The huge page sizes on i386 and x86_64 platforms are contigent on
> their page table structure. This can be completely different on other
> platforms.
>

The size doesn't really make much difference to the mechanism.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

  reply	other threads:[~2007-01-26 16:48 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-25 23:44 Mel Gorman
2007-01-25 23:45 ` [PATCH 1/8] Add __GFP_MOVABLE for callers to flag allocations that may be migrated Mel Gorman
2007-01-26 12:27   ` Nick Piggin
2007-01-26 13:25     ` Mel Gorman
2007-01-25 23:45 ` [PATCH 2/8] Create the ZONE_MOVABLE zone Mel Gorman
2007-01-26 16:28   ` Christoph Lameter
2007-01-26 16:49     ` Mel Gorman
2007-01-29 17:28     ` Mel Gorman
2007-01-26 17:16   ` Christoph Lameter
2007-01-26 17:24     ` Mel Gorman
2007-01-26 17:25       ` Christoph Lameter
2007-01-26 17:38         ` Mel Gorman
2007-01-29 17:31     ` Mel Gorman
2007-01-25 23:45 ` [PATCH 3/8] Allow huge page allocations to use GFP_HIGH_MOVABLE Mel Gorman
2007-01-26 16:33   ` Christoph Lameter
2007-01-26 16:58     ` Mel Gorman
2007-01-26 17:04       ` Christoph Lameter
2007-01-26 17:20         ` Mel Gorman
2007-01-26 17:22           ` Christoph Lameter
2007-01-26 17:37             ` Mel Gorman
2007-01-26 17:45               ` Christoph Lameter
2007-01-26 17:53                 ` Mel Gorman
2007-01-26 18:20                   ` Christoph Lameter
2007-01-26 20:37                     ` Mel Gorman
2007-01-26 18:35                   ` Chris Friesen
2007-01-26 20:44                     ` Mel Gorman
2007-01-26 21:37                       ` Chris Friesen
2007-01-25 23:46 ` [PATCH 4/8] x86 - Specify amount of kernel memory at boot time Mel Gorman
2007-01-25 23:46 ` [PATCH 5/8] ppc and powerpc " Mel Gorman
2007-01-25 23:46 ` [PATCH 6/8] x86_64 " Mel Gorman
2007-01-25 23:47 ` [PATCH 7/8] ia64 " Mel Gorman
2007-01-25 23:47 ` [PATCH 8/8] Add documentation for additional boot parameter and sysctl Mel Gorman
2007-01-26 11:07 ` [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages Andrew Morton
2007-01-26 14:29   ` Mel Gorman
2007-01-26 16:01     ` Christoph Lameter
2007-01-26 15:56   ` Christoph Lameter
2007-01-26 19:46     ` Andrew Morton
2007-01-26 19:58       ` Christoph Lameter
2007-01-26 20:27         ` Andrew Morton
2007-01-29 21:54           ` Christoph Lameter
2007-01-29 22:36             ` Andrew Morton
2007-01-29 22:45               ` Christoph Lameter
2007-01-29 22:50                 ` Russell King
2007-01-29 23:37                   ` Christoph Lameter
2007-01-30  0:09                     ` Andrew Morton
2007-01-30  9:53                       ` Peter Zijlstra
2007-02-02  5:27                         ` Christoph Lameter
2007-02-02  5:22                       ` Christoph Lameter
2007-01-26 16:21 ` Christoph Lameter
2007-01-26 16:48   ` Mel Gorman [this message]
2007-01-26 17:02     ` Christoph Lameter
2007-01-26 17:20       ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0701261629050.23091@skynet.skynet.ie \
    --to=mel@csn.ul.ie \
    --cc=clameter@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --subject='Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).