LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Paul Jackson <pj@sgi.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: akpm@osdl.org, menage@google.com, linux-kernel@vger.kernel.org,
	nickpiggin@yahoo.com.au, linux-mm@kvack.org, ak@suse.de,
	dgc@sgi.com, clameter@sgi.com
Subject: Re: [RFC 0/8] Cpuset aware writeback
Date: Tue, 16 Jan 2007 01:25:09 -0800	[thread overview]
Message-ID: <20070116012509.145ead36.pj@sgi.com> (raw)
In-Reply-To: <20070116054743.15358.77287.sendpatchset@schroedinger.engr.sgi.com>

Christoph wrote:
> Currently cpusets are not able to do proper writeback since
> dirty ratio calculations and writeback are all done for the system
> as a whole.

Thanks for tackling this - it is sorely needed.

I'm afraid my review will be mostly cosmetic; I'm not competent
to comment on the really interesting stuff.

> If we are in a cpuset then we select only inodes for writeback
> that have pages on the nodes of the cpuset.

Sorry - you tripped over a subtle distinction that happens to be on my
list of things to notice.

When cpusets are configured, -all- tasks are in a cpuset.  And
(correctly so, I trust) this patch doesn't look into the tasks cpuset
to see what nodes it allows.  Rather it looks to the mems_allowed field
in the task struct, which is equal to or (when set_mempolicy is used) a
subset of that tasks cpusets allowed nodes.

Perhaps the following phrasing would be more accurate:

  If CPUSETs are configured, then we select only the inodes for
  writeback that have dirty pages on that tasks mems_allowed nodes.

> Secondly we modify the dirty limit calculation to be based
> on the acctive cpuset.

As above, perhaps the following would be more accurate:

  Secondly we modify the dirty limit calculation to be based
  on the current tasks mems_allowed nodes.

> 1. The nodemask expands the inode structure significantly if the
> architecture allows a high number of nodes. This is only an issue
> for IA64. 

Should that logic be disabled if HOTPLUG is configured on?  Or is
nr_node_ids a valid upper limit on what could be plugged in, even on a
mythical HOTPLUG system?

> 2. The calculation of the per cpuset limits can require looping
> over a number of nodes which may bring the performance of get_dirty_limits
> near pre 2.6.18 performance

Could we cache these limits?  Perhaps they only need to be recalculated if
a tasks mems_allowed changes?

Separate question - what happens if a tasks mems_allowed changes while it is
dirtying pages?  We could easily end up with dirty pages on nodes that are
no longer allowed to the task.  Is there anyway that such a miscalculation
could cause us to do harmful things?

In patch 2/8:
> The dirty map is cleared when the inode is cleared. There is no
> synchronization (except for atomic nature of node_set) for the dirty_map. The
> only problem that could be done is that we do not write out an inode if a
> node bit is not set.

Does this mean that a dirty page could be left 'forever' in memory, unwritten,
exposing us to risk of data corruption on disk, from some write done weeks ago,
but unwritten, in the event of say a power loss?

Also in patch 2/8:
> +static inline void cpuset_update_dirty_nodes(struct inode *i,
> +		struct page *page) {}

Is an incomplete 'struct inode;' declaration needed here in cpuset.h,
to avoid a warning if compiling with CPUSETS not configured?

In patch 4/8:
> We now add per node information which I think is equal or less effort
> since there are less nodes than processors.

Not so on Paul Menage's fake NUMA nodes - he can have say 64 fake nodes on
a system with 2 or 4 CPUs and one real node.  But I guess that's ok ...

In patch 4/8:
> +#ifdef CONFIG_CPUSETS
> +	/*
> +	 * Calculate the limits relative to the current cpuset if necessary.
> +	 */
> +	if (unlikely(nodes &&
> +			!nodes_subset(node_online_map, *nodes))) {
> +		int node;
> +
> +		is_subset = 1;
> +		...
> +#ifdef CONFIG_HIGHMEM
> +			high_memory += NODE_DATA(node)
> +				->node_zones[ZONE_HIGHMEM]->present_pages;
> +#endif
> +			nr_mapped += node_page_state(node, NR_FILE_MAPPED) +
> +					node_page_state(node, NR_ANON_PAGES);
> +		}
> +	} else
> +#endif
> +	{

I'm wishing there was a clearer way to write the above code.  Nested
ifdef's and an ifdef block ending in an open 'else' and perhaps the first
#ifdef CONFIG_CPUSETS ever, outside of fs/proc/base.c ...

However I have no clue if such a clearer way exists.  Sorry.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

  parent reply	other threads:[~2007-01-16  9:25 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-16  5:47 Christoph Lameter
2007-01-16  5:47 ` [RFC 1/8] Convert higest_possible_node_id() into nr_node_ids Christoph Lameter
2007-01-16 22:05   ` Andi Kleen
2007-01-17  3:14     ` Christoph Lameter
2007-01-17  4:15       ` Andi Kleen
2007-01-17  4:23         ` Christoph Lameter
2007-01-16  5:47 ` [RFC 2/8] Add a map to inodes to track dirty pages per node Christoph Lameter
2007-01-16  5:47 ` [RFC 3/8] Add a nodemask to pdflush functions Christoph Lameter
2007-01-16  5:48 ` [RFC 4/8] Per cpuset dirty ratio handling and writeout Christoph Lameter
2007-01-16  5:48 ` [RFC 5/8] Make writeout during reclaim cpuset aware Christoph Lameter
2007-01-16 22:07   ` Andi Kleen
2007-01-17  4:20     ` Paul Jackson
2007-01-17  4:28       ` Andi Kleen
2007-01-17  4:36         ` Paul Jackson
2007-01-17  5:59           ` Andi Kleen
2007-01-17  6:19             ` Christoph Lameter
2007-01-17  4:23     ` Christoph Lameter
2007-01-16  5:48 ` [RFC 6/8] Throttle vm writeout per cpuset Christoph Lameter
2007-01-16  5:48 ` [RFC 7/8] Exclude unreclaimable pages from dirty ration calculation Christoph Lameter
2007-01-18 15:48   ` Nikita Danilov
2007-01-18 19:56     ` Christoph Lameter
2007-01-16  5:48 ` [RFC 8/8] Reduce inode memory usage for systems with a high MAX_NUMNODES Christoph Lameter
2007-01-16 19:52   ` Paul Menage
2007-01-16 20:00     ` Christoph Lameter
2007-01-16 20:06       ` Paul Menage
2007-01-16 20:51         ` Christoph Lameter
2007-01-16  7:38 ` [RFC 0/8] Cpuset aware writeback Peter Zijlstra
2007-01-16 20:10   ` Christoph Lameter
2007-01-16  9:25 ` Paul Jackson [this message]
2007-01-16 17:13   ` Christoph Lameter
2007-01-16 21:53 ` Andrew Morton
2007-01-16 22:08   ` [PATCH] nfs: fix congestion control Peter Zijlstra
2007-01-16 22:27     ` Trond Myklebust
2007-01-17  2:41       ` Peter Zijlstra
2007-01-17  6:15         ` Trond Myklebust
2007-01-17  8:49           ` Peter Zijlstra
2007-01-17 13:50             ` Trond Myklebust
2007-01-17 14:29               ` Peter Zijlstra
2007-01-17 14:45                 ` Trond Myklebust
2007-01-17 20:05     ` Christoph Lameter
2007-01-17 21:52       ` Peter Zijlstra
2007-01-17 21:54         ` Trond Myklebust
2007-01-18 13:27           ` Peter Zijlstra
2007-01-18 15:49             ` Trond Myklebust
2007-01-19  9:33               ` Peter Zijlstra
2007-01-19 13:07                 ` Peter Zijlstra
2007-01-19 16:51                   ` Trond Myklebust
2007-01-19 17:54                     ` Peter Zijlstra
2007-01-19 17:20                   ` Christoph Lameter
2007-01-19 17:57                     ` Peter Zijlstra
2007-01-19 18:02                       ` Christoph Lameter
2007-01-19 18:26                       ` Trond Myklebust
2007-01-19 18:27                         ` Christoph Lameter
2007-01-20  7:01                         ` [PATCH] nfs: fix congestion control -v3 Peter Zijlstra
2007-01-22 16:12                           ` Trond Myklebust
2007-01-25 15:32                             ` [PATCH] nfs: fix congestion control -v4 Peter Zijlstra
2007-01-26  5:02                               ` Andrew Morton
2007-01-26  8:00                                 ` Peter Zijlstra
2007-01-26  8:50                                   ` Peter Zijlstra
2007-01-26  5:09                               ` Andrew Morton
2007-01-26  5:31                                 ` Christoph Lameter
2007-01-26  6:04                                   ` Andrew Morton
2007-01-26  6:53                                     ` Christoph Lameter
2007-01-26  8:03                                     ` Peter Zijlstra
2007-01-26  8:51                                       ` Andrew Morton
2007-01-26  9:01                                         ` Peter Zijlstra
2007-02-20 12:59                                         ` Peter Zijlstra
2007-01-22 17:59                           ` [PATCH] nfs: fix congestion control -v3 Christoph Lameter
2007-01-17 23:15     ` [PATCH] nfs: fix congestion control Christoph Hellwig
2007-01-16 22:15   ` [RFC 0/8] Cpuset aware writeback Christoph Lameter
2007-01-16 23:40     ` Andrew Morton
2007-01-17  0:16       ` Christoph Lameter
2007-01-17  1:07         ` Andrew Morton
2007-01-17  1:30           ` Christoph Lameter
2007-01-17  2:34             ` Andrew Morton
2007-01-17  3:40               ` Christoph Lameter
2007-01-17  4:02                 ` Paul Jackson
2007-01-17  4:05                 ` Andrew Morton
2007-01-17  6:27                   ` Christoph Lameter
2007-01-17  7:00                     ` Andrew Morton
2007-01-17  8:01                       ` Paul Jackson
2007-01-17  9:57                         ` Andrew Morton
2007-01-17 19:43                       ` Christoph Lameter
2007-01-17 22:10                         ` Andrew Morton
2007-01-18  1:10                           ` Christoph Lameter
2007-01-18  1:25                             ` Andrew Morton
2007-01-18  5:21                               ` Christoph Lameter
2007-01-16 23:44   ` David Chinner
2007-01-16 22:01 ` Andi Kleen
2007-01-16 22:18   ` Christoph Lameter
2007-02-02  1:38 ` Ethan Solomita
2007-02-02  2:16   ` Christoph Lameter
2007-02-02  4:03     ` Andrew Morton
2007-02-02  5:29       ` Christoph Lameter
2007-02-02  6:02         ` Neil Brown
2007-02-02  6:17           ` Christoph Lameter
2007-02-02  6:41             ` Neil Brown
2007-02-02  7:12         ` Andrew Morton
2007-03-21 21:11     ` Ethan Solomita
2007-03-21 21:29       ` Christoph Lameter
2007-03-21 21:52         ` Andrew Morton
2007-03-21 21:57           ` Christoph Lameter
2007-04-19  2:07         ` Ethan Solomita
2007-04-19  2:55           ` Christoph Lameter
2007-04-19  7:52             ` Ethan Solomita
2007-04-19 16:03               ` Christoph Lameter
2007-04-21  1:37             ` Ethan Solomita
2007-04-21  1:48               ` Christoph Lameter
2007-04-21  8:15                 ` Ethan Solomita
2007-04-21 15:40                   ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070116012509.145ead36.pj@sgi.com \
    --to=pj@sgi.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=clameter@sgi.com \
    --cc=dgc@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=menage@google.com \
    --cc=nickpiggin@yahoo.com.au \
    --subject='Re: [RFC 0/8] Cpuset aware writeback' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).