LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Christoph Lameter <clameter@sgi.com>
To: Andrew Morton <akpm@osdl.org>
Cc: menage@google.com, linux-kernel@vger.kernel.org,
	nickpiggin@yahoo.com.au, linux-mm@kvack.org, ak@suse.de,
	pj@sgi.com, dgc@sgi.com
Subject: Re: [RFC 0/8] Cpuset aware writeback
Date: Tue, 16 Jan 2007 17:30:26 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0701161709490.4455@schroedinger.engr.sgi.com> (raw)
In-Reply-To: <20070116170734.947264f2.akpm@osdl.org>

On Tue, 16 Jan 2007, Andrew Morton wrote:

> Nope.  You've completely omitted the little fact that we'll do writeback in
> the offending zone off the LRU.  Slower, maybe.  But it should work and the
> system should recover.  If it's not doing that (it isn't) then we should
> fix it rather than avoiding it (by punting writeback over to pdflush).

pdflush is not running *at* all nor is dirty throttling working. That is 
correct behavior? We could do background writeback but we choose not to do 
so? Instead we wait until we hit reclaim and then block (well it seems 
that we do not block the blocking there also fails since we again check 
global ratios)?

> > The patchset does not allow processes to allocate from other nodes than 
> > the current cpuset.
> 
> Yes it does.  It asks pdflush to perform writeback of the offending zone(s)
> rather than (or as well as) doing it directly.  The only reason pdflush can
> sucessfuly do that is because pdflush can allocate its requests from other
> zones.

Ok pdflush is able to do that. Still the application is not able to 
extend its memory beyond the cpuset. What about writeback throttling? 
There it all breaks down. The cpuset is effective and we are unable to 
allocate any more memory. 

The reason this works is because not all of memory is dirty. Thus reclaim 
will be able to free up memory or there is enough memory free.

> > AFAIK any filesyste/block device can go oom with the current broken 
> > writeback it just does a few allocations. Its a matter of hitting the 
> > sweet spots.
> 
> That shouldn't be possible, in theory.  Block IO is supposed to succeed if
> *all memory in the machine is dirty*: the old
> dirty-everything-with-MAP_SHARED-then-exit problem.  Lots of testing went
> into that and it works.  It also failed on NFS although I thought that got
> "fixed" a year or so ago.  Apparently not.

Humm... Really?

> > Nope. Why would a dirty zone pose a problem? The proble exist if you 
> > cannot allocate more memory.
> 
> Well one example would be a GFP_KERNEL allocation on a highmem machine in
> whcih all of ZONE_NORMAL is dirty.

That is a restricted allocation which will lead to reclaim.

> > If we have multiple zones then other zones may still provide memory to 
> > continue (same as in UP).
> 
> Not if all the eligible zones are all-dirty.

They are all dirty if we do not throttle the dirty pages.

> Right now, what we have is an NFS bug.  How about we fix it, then
> reevaluate the situation?

The "NFS bug" only exists when using a cpuset. If you run NFS without 
cpusets then the throttling will kick in and everything is fine.

> A good starting point would be to show us one of these oom-killer traces.

No traces. Since the process is killed within a cpuset we only get 
messages like:

Nov 28 16:19:52 ic4 kernel: Out of Memory: Kill process 679783 (ncks) score 0 and children.
Nov 28 16:19:52 ic4 kernel: No available memory in cpuset: Killed process 679783 (ncks).
Nov 28 16:27:58 ic4 kernel: oom-killer: gfp_mask=0x200d2, order=0

Probably need to rerun these with some patches.

> > Lets say we have a cpuset with 4 nodes (thus 4 zones) and we are running 
> > on the first node. Then we copy a large file to disk. Node local 
> > allocation means that we allocate from the first node. After we reach 40% 
> > of the node then we throttle? This is going to be a significant 
> > performance degradation since we can no longer use the memory of other 
> > nodes to buffer writeout.
> 
> That was what I was referring to.

Note that this was describing the behavior you wanted not the way things 
work. It is desired behavior not to use all the memory resources of the 
cpuset and slow down the system?



  reply	other threads:[~2007-01-17  1:30 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-16  5:47 Christoph Lameter
2007-01-16  5:47 ` [RFC 1/8] Convert higest_possible_node_id() into nr_node_ids Christoph Lameter
2007-01-16 22:05   ` Andi Kleen
2007-01-17  3:14     ` Christoph Lameter
2007-01-17  4:15       ` Andi Kleen
2007-01-17  4:23         ` Christoph Lameter
2007-01-16  5:47 ` [RFC 2/8] Add a map to inodes to track dirty pages per node Christoph Lameter
2007-01-16  5:47 ` [RFC 3/8] Add a nodemask to pdflush functions Christoph Lameter
2007-01-16  5:48 ` [RFC 4/8] Per cpuset dirty ratio handling and writeout Christoph Lameter
2007-01-16  5:48 ` [RFC 5/8] Make writeout during reclaim cpuset aware Christoph Lameter
2007-01-16 22:07   ` Andi Kleen
2007-01-17  4:20     ` Paul Jackson
2007-01-17  4:28       ` Andi Kleen
2007-01-17  4:36         ` Paul Jackson
2007-01-17  5:59           ` Andi Kleen
2007-01-17  6:19             ` Christoph Lameter
2007-01-17  4:23     ` Christoph Lameter
2007-01-16  5:48 ` [RFC 6/8] Throttle vm writeout per cpuset Christoph Lameter
2007-01-16  5:48 ` [RFC 7/8] Exclude unreclaimable pages from dirty ration calculation Christoph Lameter
2007-01-18 15:48   ` Nikita Danilov
2007-01-18 19:56     ` Christoph Lameter
2007-01-16  5:48 ` [RFC 8/8] Reduce inode memory usage for systems with a high MAX_NUMNODES Christoph Lameter
2007-01-16 19:52   ` Paul Menage
2007-01-16 20:00     ` Christoph Lameter
2007-01-16 20:06       ` Paul Menage
2007-01-16 20:51         ` Christoph Lameter
2007-01-16  7:38 ` [RFC 0/8] Cpuset aware writeback Peter Zijlstra
2007-01-16 20:10   ` Christoph Lameter
2007-01-16  9:25 ` Paul Jackson
2007-01-16 17:13   ` Christoph Lameter
2007-01-16 21:53 ` Andrew Morton
2007-01-16 22:08   ` [PATCH] nfs: fix congestion control Peter Zijlstra
2007-01-16 22:27     ` Trond Myklebust
2007-01-17  2:41       ` Peter Zijlstra
2007-01-17  6:15         ` Trond Myklebust
2007-01-17  8:49           ` Peter Zijlstra
2007-01-17 13:50             ` Trond Myklebust
2007-01-17 14:29               ` Peter Zijlstra
2007-01-17 14:45                 ` Trond Myklebust
2007-01-17 20:05     ` Christoph Lameter
2007-01-17 21:52       ` Peter Zijlstra
2007-01-17 21:54         ` Trond Myklebust
2007-01-18 13:27           ` Peter Zijlstra
2007-01-18 15:49             ` Trond Myklebust
2007-01-19  9:33               ` Peter Zijlstra
2007-01-19 13:07                 ` Peter Zijlstra
2007-01-19 16:51                   ` Trond Myklebust
2007-01-19 17:54                     ` Peter Zijlstra
2007-01-19 17:20                   ` Christoph Lameter
2007-01-19 17:57                     ` Peter Zijlstra
2007-01-19 18:02                       ` Christoph Lameter
2007-01-19 18:26                       ` Trond Myklebust
2007-01-19 18:27                         ` Christoph Lameter
2007-01-20  7:01                         ` [PATCH] nfs: fix congestion control -v3 Peter Zijlstra
2007-01-22 16:12                           ` Trond Myklebust
2007-01-25 15:32                             ` [PATCH] nfs: fix congestion control -v4 Peter Zijlstra
2007-01-26  5:02                               ` Andrew Morton
2007-01-26  8:00                                 ` Peter Zijlstra
2007-01-26  8:50                                   ` Peter Zijlstra
2007-01-26  5:09                               ` Andrew Morton
2007-01-26  5:31                                 ` Christoph Lameter
2007-01-26  6:04                                   ` Andrew Morton
2007-01-26  6:53                                     ` Christoph Lameter
2007-01-26  8:03                                     ` Peter Zijlstra
2007-01-26  8:51                                       ` Andrew Morton
2007-01-26  9:01                                         ` Peter Zijlstra
2007-02-20 12:59                                         ` Peter Zijlstra
2007-01-22 17:59                           ` [PATCH] nfs: fix congestion control -v3 Christoph Lameter
2007-01-17 23:15     ` [PATCH] nfs: fix congestion control Christoph Hellwig
2007-01-16 22:15   ` [RFC 0/8] Cpuset aware writeback Christoph Lameter
2007-01-16 23:40     ` Andrew Morton
2007-01-17  0:16       ` Christoph Lameter
2007-01-17  1:07         ` Andrew Morton
2007-01-17  1:30           ` Christoph Lameter [this message]
2007-01-17  2:34             ` Andrew Morton
2007-01-17  3:40               ` Christoph Lameter
2007-01-17  4:02                 ` Paul Jackson
2007-01-17  4:05                 ` Andrew Morton
2007-01-17  6:27                   ` Christoph Lameter
2007-01-17  7:00                     ` Andrew Morton
2007-01-17  8:01                       ` Paul Jackson
2007-01-17  9:57                         ` Andrew Morton
2007-01-17 19:43                       ` Christoph Lameter
2007-01-17 22:10                         ` Andrew Morton
2007-01-18  1:10                           ` Christoph Lameter
2007-01-18  1:25                             ` Andrew Morton
2007-01-18  5:21                               ` Christoph Lameter
2007-01-16 23:44   ` David Chinner
2007-01-16 22:01 ` Andi Kleen
2007-01-16 22:18   ` Christoph Lameter
2007-02-02  1:38 ` Ethan Solomita
2007-02-02  2:16   ` Christoph Lameter
2007-02-02  4:03     ` Andrew Morton
2007-02-02  5:29       ` Christoph Lameter
2007-02-02  6:02         ` Neil Brown
2007-02-02  6:17           ` Christoph Lameter
2007-02-02  6:41             ` Neil Brown
2007-02-02  7:12         ` Andrew Morton
2007-03-21 21:11     ` Ethan Solomita
2007-03-21 21:29       ` Christoph Lameter
2007-03-21 21:52         ` Andrew Morton
2007-03-21 21:57           ` Christoph Lameter
2007-04-19  2:07         ` Ethan Solomita
2007-04-19  2:55           ` Christoph Lameter
2007-04-19  7:52             ` Ethan Solomita
2007-04-19 16:03               ` Christoph Lameter
2007-04-21  1:37             ` Ethan Solomita
2007-04-21  1:48               ` Christoph Lameter
2007-04-21  8:15                 ` Ethan Solomita
2007-04-21 15:40                   ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0701161709490.4455@schroedinger.engr.sgi.com \
    --to=clameter@sgi.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=dgc@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=menage@google.com \
    --cc=nickpiggin@yahoo.com.au \
    --cc=pj@sgi.com \
    --subject='Re: [RFC 0/8] Cpuset aware writeback' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).