LKML Archive on
help / color / mirror / Atom feed
From: "John Stoffel" <>
To: Rik van Riel <>
	KOSAKI Motohiro <>,
	Lee Schermerhorn <>,
Subject: Re: [patch 00/21] VM pageout scalability improvements
Date: Thu, 28 Feb 2008 15:14:02 -0500	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

Rik> On large memory systems, the VM can spend way too much time
Rik> scanning through pages that it cannot (or should not) evict from
Rik> memory. Not only does it use up CPU time, but it also provokes
Rik> lock contention and can leave large systems under memory presure
Rik> in a catatonic state.

Nitpicky, but what is a large memory system?  I read your web page and
you talk about large memory being greater than several Gb, and about
huge systems (> 128gb).  So which is this patch addressing?  

I ask because I've got a new system with 4Gb of RAM and my motherboard
can goto 8Gb.  Should this be a large memory system or not?  I've also
only got a single dual core CPU, how does that affect things?

You talk about the Inactive list in the Anonymous memory section, and
about limiting it.  You say 30% on a 1Gb system, but 1% on a 1Tb
system, which is interesting numbers but it's not clear where they
come from.

Should the IO limits (raised lower down in the document) be a more
core feature?  I.e. if you only have 20MBytes/sec bandwidth to disk
for swap, should you be limiting the inactive list to 5seconds of
bandwidth in terms of size?  Or 10s, or 60s?  

Should we be more aggresive in pre-swapping Anonymous memory to swap,
but keeping it cached in memory for use?  If there's pressure, it
seems like it would be easy to just dump pre-swapped pages from the
inactive list, without having to spend time writing them out.

Also, how does having more CPUs/IO bandwidth change things?  Do we
need an exponential backoff algorithm in terms of how much memory is
allocated to the various lists?  As memory gets bigger and bigger, do
we allocated fewer and fewer pages since we can't swap them out fast

I dunno... I honestly don't have the time or the knowledge to do more
than poke sticks into things and see what happens.  And to ask
annoying questions.  

I do appreciate your work on this.


Rik> Against 2.6.24-rc6-mm1

Rik> This patch series improves VM scalability by:

Rik> 1) making the locking a little more scalable

Rik> 2) putting filesystem backed, swap backed and non-reclaimable pages
Rik>    onto their own LRUs, so the system only scans the pages that it
Rik>    can/should evict from memory

Rik> 3) switching to SEQ replacement for the anonymous LRUs, so the
Rik>    number of pages that need to be scanned when the system
Rik>    starts swapping is bound to a reasonable number

Rik> More info on the overall design can be found at:


Rik> Changelog:
Rik> - pull the memcontrol lru arrayification earlier into the patch series
Rik> - use a pagevec array similar to the lru array
Rik> - clean up the code in various places
Rik> - improved pageout balancing and reduced pageout cpu use

Rik> - fix compilation on PPC and without memcontrol
Rik> - make page_is_pagecache more readable
Rik> - replace get_scan_ratio with correct version

Rik> - merge memcontroller split LRU code into the main split LRU patch,
Rik>   since it is not functionally different (it was split up only to help
Rik>   people who had seen the last version of the patch series review it)
Rik> - drop the page_file_cache debugging patch, since it never triggered
Rik> - reintroduce code to not scan anon list if swap is full
Rik> - add code to scan anon list if page cache is very small already
Rik> - use lumpy reclaim more aggressively for smaller order > 1 allocations

Rik> -- 
Rik> All Rights Reversed

Rik> --
Rik> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Rik> the body of a message to
Rik> More majordomo info at
Rik> Please read the FAQ at

Rik> !DSPAM:47c70f4e50261498712856!

  parent reply	other threads:[~2008-02-28 20:15 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-28 19:29 Rik van Riel
2008-02-28 19:29 ` [patch 01/21] move isolate_lru_page() to vmscan.c Rik van Riel
2008-02-29  2:29   ` KOSAKI Motohiro
2008-02-29  2:41     ` Rik van Riel
2008-02-29  2:47       ` KOSAKI Motohiro
2008-02-28 19:29 ` [patch 02/21] Use an indexed array for LRU variables Rik van Riel
2008-02-29 16:03   ` Andy Whitcroft
2008-03-03 18:57     ` Rik van Riel
2008-02-28 19:29 ` [patch 03/21] use an array for the LRU pagevecs Rik van Riel
2008-02-29 15:40   ` Andy Whitcroft
2008-03-01  7:02     ` KOSAKI Motohiro
2008-03-04 11:04       ` KOSAKI Motohiro
2008-03-04 20:38         ` Rik van Riel
2008-03-05  1:38           ` KOSAKI Motohiro
2008-02-28 19:29 ` [patch 04/21] free swap space on swap-in/activation Rik van Riel
2008-02-28 20:05   ` Lee Schermerhorn
2008-02-28 20:20     ` Rik van Riel
2008-02-28 19:29 ` [patch 05/21] define page_file_cache() function Rik van Riel
2008-02-29 11:53   ` KOSAKI Motohiro
2008-02-28 19:29 ` [patch 06/21] split LRU lists into anon & file sets Rik van Riel
2008-03-01 12:13   ` KOSAKI Motohiro
2008-03-01 12:46   ` KOSAKI Motohiro
2008-02-28 19:29 ` [patch 07/21] SEQ replacement for anonymous pages Rik van Riel
2008-03-03 10:50   ` barrioskmc@gmail
2008-02-28 19:29 ` [patch 08/21] (NEW) add some sanity checks to get_scan_ratio Rik van Riel
2008-03-04 10:40   ` minchan Kim
2008-02-28 19:29 ` [patch 09/21] (NEW) improve reclaim balancing Rik van Riel
2008-03-01 13:35   ` KOSAKI Motohiro
2008-03-03 19:26     ` Rik van Riel
2008-02-28 19:29 ` [patch 10/21] add newly swapped in pages to the inactive list Rik van Riel
2008-02-28 19:29 ` [patch 11/21] (NEW) more aggressively use lumpy reclaim Rik van Riel
2008-03-02 10:35   ` KOSAKI Motohiro
2008-03-02 14:23     ` Rik van Riel
2008-02-28 19:29 ` [patch 12/21] No Reclaim LRU Infrastructure Rik van Riel
     [not found]   ` <>
2008-02-29 14:48     ` Lee Schermerhorn
     [not found]       ` <>
2008-03-03  3:06         ` minchan Kim
2008-03-03 18:46         ` Rik van Riel
2008-03-03 23:38           ` barrioskmc@gmail
2008-03-04  1:55             ` Rik van Riel
2008-03-04 10:46   ` KOSAKI Motohiro
2008-03-04 15:05     ` Lee Schermerhorn
2008-03-04 21:21       ` Rik van Riel
2008-03-05  1:42       ` KOSAKI Motohiro
2008-02-28 19:29 ` [patch 13/21] Non-reclaimable page statistics Rik van Riel
2008-02-28 19:29 ` [patch 14/21] scan noreclaim list for reclaimable pages Rik van Riel
2008-02-28 23:41   ` Randy Dunlap
2008-02-29 14:38     ` Lee Schermerhorn
2008-02-28 19:29 ` [patch 15/21] ramfs pages are non-reclaimable Rik van Riel
2008-02-28 19:29 ` [patch 16/21] SHM_LOCKED pages are nonreclaimable Rik van Riel
2008-02-28 19:29 ` [patch 17/21] non-reclaimable mlocked pages Rik van Riel
     [not found]   ` <>
2008-02-29 14:47     ` Lee Schermerhorn
2008-02-28 19:29 ` [patch 18/21] mlock vma pages under mmap_sem held for read Rik van Riel
2008-02-28 19:29 ` [patch 19/21] handle mlocked pages during map/unmap and truncate Rik van Riel
2008-02-28 19:29 ` [patch 20/21] account mlocked pages Rik van Riel
2008-02-28 19:29 ` [patch 21/21] cull non-reclaimable anon pages from the LRU at fault time Rik van Riel
2008-02-28 20:19   ` Lee Schermerhorn
2008-02-28 22:27     ` Rik van Riel
2008-02-28 19:49 ` [patch 00/21] VM pageout scalability improvements Rik van Riel
2008-02-28 20:14 ` John Stoffel [this message]
2008-02-28 20:23   ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \
    --subject='Re: [patch 00/21] VM pageout scalability improvements' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).