LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Andrew Morton <akpm@osdl.org>
To: Christoph Lameter <clameter@sgi.com>
Cc: menage@google.com, linux-kernel@vger.kernel.org,
nickpiggin@yahoo.com.au, linux-mm@kvack.org, ak@suse.de,
pj@sgi.com, dgc@sgi.com
Subject: Re: [RFC 0/8] Cpuset aware writeback
Date: Tue, 16 Jan 2007 15:40:54 -0800 [thread overview]
Message-ID: <20070116154054.e655f75c.akpm@osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0701161407530.3545@schroedinger.engr.sgi.com>
> On Tue, 16 Jan 2007 14:15:56 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote:
>
> ...
>
> > > This may result in a large percentage of a cpuset
> > > to become dirty without writeout being triggered. Under NFS
> > > this can lead to OOM conditions.
> >
> > OK, a big question: is this patchset a performance improvement or a
> > correctness fix? Given the above, and the lack of benchmark results I'm
> > assuming it's for correctness.
>
> It is a correctness fix both for NFS OOM and doing proper cpuset writeout.
It's a workaround for a still-unfixed NFS problem.
> > - Why does NFS go oom? Because it allocates potentially-unbounded
> > numbers of requests in the writeback path?
> >
> > It was able to go oom on non-numa machines before dirty-page-tracking
> > went in. So a general problem has now become specific to some NUMA
> > setups.
>
>
> Right. The issue is that large portions of memory become dirty /
> writeback since no writeback occurs because dirty limits are not checked
> for a cpuset. Then NFS attempt to writeout when doing LRU scans but is
> unable to allocate memory.
>
> > So an obvious, equivalent and vastly simpler "fix" would be to teach
> > the NFS client to go off-cpuset when trying to allocate these requests.
>
> Yes we can fix these allocations by allowing processes to allocate from
> other nodes. But then the container function of cpusets is no longer
> there.
But that's what your patch already does!
It asks pdflush to write the pages instead of the direct-reclaim caller.
The only reason pdflush doesn't go oom is that pdflush lives outside the
direct-reclaim caller's cpuset and is hence able to obtain those nfs
requests from off-cpuset zones.
> > (But is it really bad? What actual problems will it cause once NFS is fixed?)
>
> NFS is okay as far as I can tell. dirty throttling works fine in non
> cpuset environments because we throttle if 40% of memory becomes dirty or
> under writeback.
Repeat: NFS shouldn't go oom. It should fail the allocation, recover, wait
for existing IO to complete. Back that up with a mempool for NFS requests
and the problem is solved, I think?
> > I don't understand why the proposed patches are cpuset-aware at all. This
> > is a per-zone problem, and a per-zone fix would seem to be appropriate, and
> > more general. For example, i386 machines can presumably get into trouble
> > if all of ZONE_DMA or ZONE_NORMAL get dirty. A good implementation would
> > address that problem as well. So I think it should all be per-zone?
>
> No. A zone can be completely dirty as long as we are allowed to allocate
> from other zones.
But we also can get into trouble if a *zone* is all-dirty. Any solution to
the cpuset problem should solve that problem too, no?
> > Do we really need those per-inode cpumasks? When page reclaim encounters a
> > dirty page on the zone LRU, we automatically know that page->mapping->host
> > has at least one dirty page in this zone, yes? We could immediately ask
>
> Yes, but when we enter reclaim most of the pages of a zone may already be
> dirty/writeback so we fail.
No. If the dirty limits become per-zone then no zone will ever have >40%
dirty.
The obvious fix here is: when a zone hits 40% dirty, perform dirty-memory
reduction in that zone, throttling the dirtying process. I suspect this
would work very badly in common situations with, say, typical i386 boxes.
next prev parent reply other threads:[~2007-01-16 23:41 UTC|newest]
Thread overview: 110+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-01-16 5:47 Christoph Lameter
2007-01-16 5:47 ` [RFC 1/8] Convert higest_possible_node_id() into nr_node_ids Christoph Lameter
2007-01-16 22:05 ` Andi Kleen
2007-01-17 3:14 ` Christoph Lameter
2007-01-17 4:15 ` Andi Kleen
2007-01-17 4:23 ` Christoph Lameter
2007-01-16 5:47 ` [RFC 2/8] Add a map to inodes to track dirty pages per node Christoph Lameter
2007-01-16 5:47 ` [RFC 3/8] Add a nodemask to pdflush functions Christoph Lameter
2007-01-16 5:48 ` [RFC 4/8] Per cpuset dirty ratio handling and writeout Christoph Lameter
2007-01-16 5:48 ` [RFC 5/8] Make writeout during reclaim cpuset aware Christoph Lameter
2007-01-16 22:07 ` Andi Kleen
2007-01-17 4:20 ` Paul Jackson
2007-01-17 4:28 ` Andi Kleen
2007-01-17 4:36 ` Paul Jackson
2007-01-17 5:59 ` Andi Kleen
2007-01-17 6:19 ` Christoph Lameter
2007-01-17 4:23 ` Christoph Lameter
2007-01-16 5:48 ` [RFC 6/8] Throttle vm writeout per cpuset Christoph Lameter
2007-01-16 5:48 ` [RFC 7/8] Exclude unreclaimable pages from dirty ration calculation Christoph Lameter
2007-01-18 15:48 ` Nikita Danilov
2007-01-18 19:56 ` Christoph Lameter
2007-01-16 5:48 ` [RFC 8/8] Reduce inode memory usage for systems with a high MAX_NUMNODES Christoph Lameter
2007-01-16 19:52 ` Paul Menage
2007-01-16 20:00 ` Christoph Lameter
2007-01-16 20:06 ` Paul Menage
2007-01-16 20:51 ` Christoph Lameter
2007-01-16 7:38 ` [RFC 0/8] Cpuset aware writeback Peter Zijlstra
2007-01-16 20:10 ` Christoph Lameter
2007-01-16 9:25 ` Paul Jackson
2007-01-16 17:13 ` Christoph Lameter
2007-01-16 21:53 ` Andrew Morton
2007-01-16 22:08 ` [PATCH] nfs: fix congestion control Peter Zijlstra
2007-01-16 22:27 ` Trond Myklebust
2007-01-17 2:41 ` Peter Zijlstra
2007-01-17 6:15 ` Trond Myklebust
2007-01-17 8:49 ` Peter Zijlstra
2007-01-17 13:50 ` Trond Myklebust
2007-01-17 14:29 ` Peter Zijlstra
2007-01-17 14:45 ` Trond Myklebust
2007-01-17 20:05 ` Christoph Lameter
2007-01-17 21:52 ` Peter Zijlstra
2007-01-17 21:54 ` Trond Myklebust
2007-01-18 13:27 ` Peter Zijlstra
2007-01-18 15:49 ` Trond Myklebust
2007-01-19 9:33 ` Peter Zijlstra
2007-01-19 13:07 ` Peter Zijlstra
2007-01-19 16:51 ` Trond Myklebust
2007-01-19 17:54 ` Peter Zijlstra
2007-01-19 17:20 ` Christoph Lameter
2007-01-19 17:57 ` Peter Zijlstra
2007-01-19 18:02 ` Christoph Lameter
2007-01-19 18:26 ` Trond Myklebust
2007-01-19 18:27 ` Christoph Lameter
2007-01-20 7:01 ` [PATCH] nfs: fix congestion control -v3 Peter Zijlstra
2007-01-22 16:12 ` Trond Myklebust
2007-01-25 15:32 ` [PATCH] nfs: fix congestion control -v4 Peter Zijlstra
2007-01-26 5:02 ` Andrew Morton
2007-01-26 8:00 ` Peter Zijlstra
2007-01-26 8:50 ` Peter Zijlstra
2007-01-26 5:09 ` Andrew Morton
2007-01-26 5:31 ` Christoph Lameter
2007-01-26 6:04 ` Andrew Morton
2007-01-26 6:53 ` Christoph Lameter
2007-01-26 8:03 ` Peter Zijlstra
2007-01-26 8:51 ` Andrew Morton
2007-01-26 9:01 ` Peter Zijlstra
2007-02-20 12:59 ` Peter Zijlstra
2007-01-22 17:59 ` [PATCH] nfs: fix congestion control -v3 Christoph Lameter
2007-01-17 23:15 ` [PATCH] nfs: fix congestion control Christoph Hellwig
2007-01-16 22:15 ` [RFC 0/8] Cpuset aware writeback Christoph Lameter
2007-01-16 23:40 ` Andrew Morton [this message]
2007-01-17 0:16 ` Christoph Lameter
2007-01-17 1:07 ` Andrew Morton
2007-01-17 1:30 ` Christoph Lameter
2007-01-17 2:34 ` Andrew Morton
2007-01-17 3:40 ` Christoph Lameter
2007-01-17 4:02 ` Paul Jackson
2007-01-17 4:05 ` Andrew Morton
2007-01-17 6:27 ` Christoph Lameter
2007-01-17 7:00 ` Andrew Morton
2007-01-17 8:01 ` Paul Jackson
2007-01-17 9:57 ` Andrew Morton
2007-01-17 19:43 ` Christoph Lameter
2007-01-17 22:10 ` Andrew Morton
2007-01-18 1:10 ` Christoph Lameter
2007-01-18 1:25 ` Andrew Morton
2007-01-18 5:21 ` Christoph Lameter
2007-01-16 23:44 ` David Chinner
2007-01-16 22:01 ` Andi Kleen
2007-01-16 22:18 ` Christoph Lameter
2007-02-02 1:38 ` Ethan Solomita
2007-02-02 2:16 ` Christoph Lameter
2007-02-02 4:03 ` Andrew Morton
2007-02-02 5:29 ` Christoph Lameter
2007-02-02 6:02 ` Neil Brown
2007-02-02 6:17 ` Christoph Lameter
2007-02-02 6:41 ` Neil Brown
2007-02-02 7:12 ` Andrew Morton
2007-03-21 21:11 ` Ethan Solomita
2007-03-21 21:29 ` Christoph Lameter
2007-03-21 21:52 ` Andrew Morton
2007-03-21 21:57 ` Christoph Lameter
2007-04-19 2:07 ` Ethan Solomita
2007-04-19 2:55 ` Christoph Lameter
2007-04-19 7:52 ` Ethan Solomita
2007-04-19 16:03 ` Christoph Lameter
2007-04-21 1:37 ` Ethan Solomita
2007-04-21 1:48 ` Christoph Lameter
2007-04-21 8:15 ` Ethan Solomita
2007-04-21 15:40 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070116154054.e655f75c.akpm@osdl.org \
--to=akpm@osdl.org \
--cc=ak@suse.de \
--cc=clameter@sgi.com \
--cc=dgc@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=menage@google.com \
--cc=nickpiggin@yahoo.com.au \
--cc=pj@sgi.com \
--subject='Re: [RFC 0/8] Cpuset aware writeback' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).