LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Compaction & folios
@ 2021-10-06 22:53 Kent Overstreet
  2021-10-06 23:17 ` Matthew Wilcox
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Kent Overstreet @ 2021-10-06 22:53 UTC (permalink / raw)
  To: linux-kernel, linux-mm; +Cc: hannes, willy, rientjes

So I have some observations on memory compaction & hugepages.

Right now, the working assumption in MM is that compaction is hard and
expensive, and right now it is - because most allocations are order 0, with a
small subset being hugepage order allocations. This means any time we need a
hugepage, compaction has to move a bunch of order 0 pages around, and memory
reclaim is no help here - when we reclaim memory, it's coming back as fragmented
order 0 pages.

But what if compaction wasn't such a difficult, expensive operation?

With folios, and then folios for anonymous pages, we won't see nearly so many
order 0 allocations anymore - we'll see a spread of allocation sizes based on a
mixture of application usage patterns - something much closer to a poisson
distribution, vs. our current very bimodal distribution. And since we won't be
fragmenting all our allocations up front, memory reclaim will be freeing
allocations in this same distribution.

Which means that any time an order n allocation fails, it's likely that we'll
still have order n-1 pages free - and of those free order n-1 pages, one will
likely have a buddy that's moveable and hasn't been fragmented - meaning the
common case is that compaction will have to move _one_ (higher order) page -
we'll almost never be having to move a bunch of 4k pages.

Another way of thinking of this is that memory reclaim will be doing most of the
work that compaction has to do now to allocate a high order page. Compaction
will go from an expensive, somewhat unreliable operation to one that mostly just
works - it's going to be _much_ less of a pain point.

It may turn out that allocating hugepages still doesn't work as reliably as we'd
like - but folios are still a big help even when we can't allocate a 2MB page,
because we'll be able to fall back to an order 6 or 7 or 8 allocation, which is
something we can't do now. And, since multiple CPU vendors now support
coalescing contiguous PTE entries in the TLB, this will still get us most of the
performance benefits of using hugepages.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Compaction & folios
  2021-10-06 22:53 Compaction & folios Kent Overstreet
@ 2021-10-06 23:17 ` Matthew Wilcox
  2021-10-07  9:15 ` Kirill A. Shutemov
  2021-10-07 10:06 ` Vlastimil Babka
  2 siblings, 0 replies; 4+ messages in thread
From: Matthew Wilcox @ 2021-10-06 23:17 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-kernel, linux-mm, hannes, rientjes

On Wed, Oct 06, 2021 at 06:53:41PM -0400, Kent Overstreet wrote:
> It may turn out that allocating hugepages still doesn't work as reliably as we'd
> like - but folios are still a big help even when we can't allocate a 2MB page,
> because we'll be able to fall back to an order 6 or 7 or 8 allocation, which is
> something we can't do now. And, since multiple CPU vendors now support
> coalescing contiguous PTE entries in the TLB, this will still get us most of the
> performance benefits of using hugepages.

I'd like to add two things:

1. A lot of people talk about the performance improvements from using
2MB pages, and there are the obvious hardware ones -- one fewer level
to dereference in the page table walk when there's a TLB miss; using a
single TLB entry to cache an entire 2MB page.

But there are the software ones, which I believe Google have measured
(perhaps it was the ChromeOS team?)  Allocating order-2/3/4 pages reduces
the length of the LRU list by a factor of 4/8/16.  That means we get 4-16x
memory reclaimed per unit of time, which reduces the LRU lock contention.
Not to mention the advantage of being able to use a pagevec to describe
960KB of memory rather than 60KB.

2. We can only measure what CPUs do today.  If our behaviour changes,
CPU vendors will adapt.  I talked to someone who dabbles in hardware
design who said that it really isn't that hard to design a TLB that
can support mapping 64KB entries at arbitrary 4KB offsets.  There's no
particular incentive for CPU manufacturers to do that today, but if we
start allocating 64KB pages to cache files, that will change.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Compaction & folios
  2021-10-06 22:53 Compaction & folios Kent Overstreet
  2021-10-06 23:17 ` Matthew Wilcox
@ 2021-10-07  9:15 ` Kirill A. Shutemov
  2021-10-07 10:06 ` Vlastimil Babka
  2 siblings, 0 replies; 4+ messages in thread
From: Kirill A. Shutemov @ 2021-10-07  9:15 UTC (permalink / raw)
  To: Kent Overstreet, Vlastimil Babka
  Cc: linux-kernel, linux-mm, hannes, willy, rientjes

On Wed, Oct 06, 2021 at 06:53:41PM -0400, Kent Overstreet wrote:
> So I have some observations on memory compaction & hugepages.
> 
> Right now, the working assumption in MM is that compaction is hard and
> expensive, and right now it is - because most allocations are order 0, with a
> small subset being hugepage order allocations. This means any time we need a
> hugepage, compaction has to move a bunch of order 0 pages around, and memory
> reclaim is no help here - when we reclaim memory, it's coming back as fragmented
> order 0 pages.
> 
> But what if compaction wasn't such a difficult, expensive operation?
> 
> With folios, and then folios for anonymous pages, we won't see nearly so many
> order 0 allocations anymore - we'll see a spread of allocation sizes based on a
> mixture of application usage patterns - something much closer to a poisson
> distribution, vs. our current very bimodal distribution. And since we won't be
> fragmenting all our allocations up front, memory reclaim will be freeing
> allocations in this same distribution.
> 
> Which means that any time an order n allocation fails, it's likely that we'll
> still have order n-1 pages free - and of those free order n-1 pages, one will
> likely have a buddy that's moveable and hasn't been fragmented - meaning the
> common case is that compaction will have to move _one_ (higher order) page -
> we'll almost never be having to move a bunch of 4k pages.
> 
> Another way of thinking of this is that memory reclaim will be doing most of the
> work that compaction has to do now to allocate a high order page. Compaction
> will go from an expensive, somewhat unreliable operation to one that mostly just
> works - it's going to be _much_ less of a pain point.
> 
> It may turn out that allocating hugepages still doesn't work as reliably as we'd
> like - but folios are still a big help even when we can't allocate a 2MB page,
> because we'll be able to fall back to an order 6 or 7 or 8 allocation, which is
> something we can't do now. And, since multiple CPU vendors now support
> coalescing contiguous PTE entries in the TLB, this will still get us most of the
> performance benefits of using hugepages.

Compaction at the moment built with assumption that compound pages are
PMD-mappable or larger and it doesn't make sense to move them:

		/*
		 * Regardless of being on LRU, compound pages such as THP and
		 * hugetlbfs are not to be compacted unless we are attempting
		 * an allocation much larger than the huge page size (eg CMA).
		 * We can potentially save a lot of iterations if we skip them
		 * at once. The check is racy, but we can consider only valid
		 * values and the only danger is skipping too much.
		 */
		if (PageCompound(page) && !cc->alloc_contig) {
			const unsigned int order = compound_order(page);

			if (likely(order < MAX_ORDER))
				low_pfn += (1UL << order) - 1;
			goto isolate_fail;
		}

It also will apply to folios with direct conversion.

It has to be reworked sooner rather than later if we want to be more
flexible on size of folios or we are risking getting compaction situation
worse.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Compaction & folios
  2021-10-06 22:53 Compaction & folios Kent Overstreet
  2021-10-06 23:17 ` Matthew Wilcox
  2021-10-07  9:15 ` Kirill A. Shutemov
@ 2021-10-07 10:06 ` Vlastimil Babka
  2 siblings, 0 replies; 4+ messages in thread
From: Vlastimil Babka @ 2021-10-07 10:06 UTC (permalink / raw)
  To: Kent Overstreet, linux-kernel, linux-mm
  Cc: hannes, willy, rientjes, Mel Gorman, Kirill A. Shutemov

On 10/7/21 00:53, Kent Overstreet wrote:
> So I have some observations on memory compaction & hugepages.
> 
> Right now, the working assumption in MM is that compaction is hard and
> expensive, and right now it is - because most allocations are order 0, with a
> small subset being hugepage order allocations. This means any time we need a
> hugepage, compaction has to move a bunch of order 0 pages around, and memory
> reclaim is no help here - when we reclaim memory, it's coming back as fragmented
> order 0 pages.
> 
> But what if compaction wasn't such a difficult, expensive operation?
> 
> With folios, and then folios for anonymous pages, we won't see nearly so many
> order 0 allocations anymore - we'll see a spread of allocation sizes based on a
> mixture of application usage patterns - something much closer to a poisson
> distribution, vs. our current very bimodal distribution. And since we won't be
> fragmenting all our allocations up front, memory reclaim will be freeing
> allocations in this same distribution.

Unfortunately, the main problem with compaction is not the act of moving a
number of LRU pages, but rather the presence of unmovable pages (slab, page
tables and whatnot kernel allocations), where such a single page makes the
whole 2MB block unusable. So I don't expect this would help dramatically for
compaction, but the points added by Matthew would still apply.

> Which means that any time an order n allocation fails, it's likely that we'll
> still have order n-1 pages free - and of those free order n-1 pages, one will
> likely have a buddy that's moveable and hasn't been fragmented - meaning the
> common case is that compaction will have to move _one_ (higher order) page -
> we'll almost never be having to move a bunch of 4k pages.
> 
> Another way of thinking of this is that memory reclaim will be doing most of the
> work that compaction has to do now to allocate a high order page. Compaction
> will go from an expensive, somewhat unreliable operation to one that mostly just
> works - it's going to be _much_ less of a pain point.
> 
> It may turn out that allocating hugepages still doesn't work as reliably as we'd
> like - but folios are still a big help even when we can't allocate a 2MB page,
> because we'll be able to fall back to an order 6 or 7 or 8 allocation, which is
> something we can't do now. And, since multiple CPU vendors now support
> coalescing contiguous PTE entries in the TLB, this will still get us most of the
> performance benefits of using hugepages.
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-10-07 10:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-06 22:53 Compaction & folios Kent Overstreet
2021-10-06 23:17 ` Matthew Wilcox
2021-10-07  9:15 ` Kirill A. Shutemov
2021-10-07 10:06 ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).