LKML Archive on lore.kernel.org help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org> To: David Rientjes <rientjes@google.com> Cc: cl@linux-foundation.org, npiggin@suse.de, peterz@infradead.org, menage@google.com, dfults@sgi.com, linux-kernel@vger.kernel.org Subject: Re: [patch 1/7] cpusets: add dirty map to struct address_space Date: Tue, 4 Nov 2008 13:09:18 -0800 [thread overview] Message-ID: <20081104130918.bee16cff.akpm@linux-foundation.org> (raw) In-Reply-To: <alpine.DEB.2.00.0810300310180.18283@chino.kir.corp.google.com> On Thu, 30 Oct 2008 12:23:11 -0700 (PDT) David Rientjes <rientjes@google.com> wrote: > From: Christoph Lameter <cl@linux-foundation.org> > > In a NUMA system it is helpful to know where the dirty pages of a mapping > are located. That way we will be able to implement writeout for > applications that are constrained to a portion of the memory of the > system as required by cpusets. > > This patch implements the management of dirty node maps for an address > space through the following functions: > > cpuset_clear_dirty_nodes(mapping) Clear the map of dirty nodes > > cpuset_update_nodes(mapping, page) Record a node in the dirty nodes > map > > cpuset_init_dirty_nodes(mapping) Initialization of the map > > > The dirty map may be stored either directly in the mapping (for NUMA > systems with less then BITS_PER_LONG nodes) or separately allocated for > systems with a large number of nodes (f.e. ia64 with 1024 nodes). > > Updating the dirty map may involve allocating it first for large > configurations. Therefore, we protect the allocation and setting of a > node in the map through the tree_lock. The tree_lock is already taken > when a page is dirtied so there is no additional locking overhead if we > insert the updating of the nodemask there. > > The dirty map is only cleared (or freed) when the inode is cleared. At > that point no pages are attached to the inode anymore and therefore it > can be done without any locking. The dirty map therefore records all nodes > that have been used for dirty pages by that inode until the inode is no > longer used. > > If the mapping for an inode's set of dirty nodes does not intersect the > set under writeback, the inode is requeued for later. This is the > mechanism for enforcing the writeback control's set of nodes policy. > > ... > > +/* > + * We need macros since struct address_space is not defined yet sigh. Thanks for adding the comment. > + */ > +#if MAX_NUMNODES <= BITS_PER_LONG > +#define cpuset_init_dirty_nodes(__mapping) \ > + (__mapping)->dirty_nodes = NODE_MASK_NONE > + > +#define cpuset_update_dirty_nodes(__mapping, __page) \ > + node_set(page_to_nid(__page), (__mapping)->dirty_nodes); > + > +#define cpuset_clear_dirty_nodes(__mapping) \ > + (__mapping)->dirty_nodes = NODE_MASK_NONE > + > +#define cpuset_intersects_dirty_nodes(__mapping, __nodemask_ptr) \ > + (!(__nodemask_ptr) || \ > + nodes_intersects((__mapping)->dirty_nodes, *(__nodemask_ptr))) This macro can evaluate __nodemask_ptr one or two times, possibly causing code bloat, performance loss or bugs. Copying it into a local would fix that. > +#else > +struct address_space; When forward-declaring structures like this I'd suggest that the declaration be put right at the top of the header file, outside ifdefs. Because putting it inside an ifdef is just inviting compilation errors later on when someone assumes that address_space was declared and didn't test all config options. And put it at the top of the file because some unobservant person might later add more code to the header ahead of the address_space declaration which needs address_space. So they go and add a second forward declaration of the same thing. We've had that happen before. > +#define cpuset_init_dirty_nodes(__mapping) \ > + (__mapping)->dirty_nodes = NULL I think there are ways in which this sort of macro can cause compilation errors or even bugs. I forget all the scenarios, but we do know that whapping a `do { ... } while (0)' around it fixes them. > +extern void cpuset_update_dirty_nodes(struct address_space *mapping, > + struct page *page); > +extern void cpuset_clear_dirty_nodes(struct address_space *mapping); > +extern int cpuset_intersects_dirty_nodes(struct address_space *mapping, > + nodemask_t *mask); > +#endif > + > #else /* !CONFIG_CPUSETS */ > > static inline int cpuset_init_early(void) { return 0; } > @@ -163,6 +193,27 @@ static inline void rebuild_sched_domains(void) > partition_sched_domains(1, NULL, NULL); > } > > +struct address_space; > + > +static inline void cpuset_init_dirty_nodes(struct address_space *mapping) > +{ > +} > + > +static inline void cpuset_update_dirty_nodes(struct address_space *mapping, > + struct page *page) > +{ > +} OK, what's ging on here? I count _three_ definitions of cpuset_update_dirty_nodes(). One is a macro, one is a static inline stub, one is an out-of-line C function. What's up with that? Maybe `#if MAX_NUMNODES > BITS_PER_LONG' added the third leg? > +static inline void cpuset_clear_dirty_nodes(struct address_space *mapping) > +{ > +} > + > +static inline int cpuset_intersects_dirty_nodes(struct address_space *mapping, > + nodemask_t *mask) > +{ > + return 1; > +} > + > #endif /* !CONFIG_CPUSETS */ > > #endif /* _LINUX_CPUSET_H */ > diff --git a/include/linux/fs.h b/include/linux/fs.h > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -544,6 +544,13 @@ struct address_space { > spinlock_t private_lock; /* for use by the address_space */ > struct list_head private_list; /* ditto */ > struct address_space *assoc_mapping; /* ditto */ > +#ifdef CONFIG_CPUSETS > +#if MAX_NUMNODES <= BITS_PER_LONG > + nodemask_t dirty_nodes; /* nodes with dirty pages */ > +#else > + nodemask_t *dirty_nodes; /* pointer to mask, if dirty */ > +#endif > +#endif > } __attribute__((aligned(sizeof(long)))); eek. Increasing the size of the address_space (and hence of the inode) is a moderately big deal - there can be millions of these in memory. Which gets me thinking... Many inodes don't have any pagecache. ext3 directories, anything which was read via stat(), for a start. Would it be worthwhile optimising for this? Options would include - remove inode.i_data and dynamically allocate it. - move your dirty_nodes thing into radix_tree_node, dynamically allocate address_space.page_tree (and tree_lock). - etc. There's a little project for someone to be going on with. > /* > * On most architectures that alignment is already the case; but > diff --git a/include/linux/writeback.h b/include/linux/writeback.h > --- a/include/linux/writeback.h > +++ b/include/linux/writeback.h > @@ -72,6 +72,8 @@ struct writeback_control { > * so we use a single control to update them > */ > unsigned no_nrwrite_index_update:1; > + > + nodemask_t *nodes; /* Nodemask to writeback */ This one doesn't get ifdefs? > }; > > /* > diff --git a/kernel/cpuset.c b/kernel/cpuset.c > --- a/kernel/cpuset.c > +++ b/kernel/cpuset.c > @@ -16,6 +16,7 @@ > * 2006 Rework by Paul Menage to use generic cgroups > * 2008 Rework of the scheduler domains and CPU hotplug handling > * by Max Krasnyansky > + * 2008 Cpuset writeback by Christoph Lameter > * > * This file is subject to the terms and conditions of the GNU General Public > * License. See the file COPYING in the main directory of the Linux > @@ -2323,6 +2324,58 @@ int cpuset_mem_spread_node(void) > } > EXPORT_SYMBOL_GPL(cpuset_mem_spread_node); > > +#if MAX_NUMNODES > BITS_PER_LONG > +/* > + * Special functions for NUMA systems with a large number of nodes. The > + * nodemask is pointed to from the address_space structure. The attachment of > + * the dirty_nodes nodemask is protected by the tree_lock. The nodemask is > + * freed only when the inode is cleared (and therefore unused, thus no locking > + * is necessary). > + */ > +void cpuset_update_dirty_nodes(struct address_space *mapping, > + struct page *page) > +{ > + nodemask_t *nodes = mapping->dirty_nodes; > + int node = page_to_nid(page); > + > + if (!nodes) { > + nodes = kmalloc(sizeof(nodemask_t), GFP_ATOMIC); erk, OK, called from __set_page_dirty, needs to be atomic. What are the consequences when this allocation fails? I prefer `foo = kmalloc(sizeof(*foo), ...)', personally. It avoids duplicating information and means that PoorOldReviewer doesn't have to crawl back through the code checking that NastyYoungCoder got the type right.
next prev parent reply other threads:[~2008-11-04 21:09 UTC|newest] Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top 2008-10-30 19:23 [patch 0/7] cpuset writeback throttling David Rientjes 2008-10-30 19:23 ` [patch 1/7] cpusets: add dirty map to struct address_space David Rientjes 2008-11-04 21:09 ` Andrew Morton [this message] 2008-11-04 21:20 ` Christoph Lameter 2008-11-04 21:42 ` Andrew Morton 2008-10-30 19:23 ` [patch 2/7] pdflush: allow the passing of a nodemask parameter David Rientjes 2008-10-30 19:23 ` [patch 3/7] mm: make page writeback obey cpuset constraints David Rientjes 2008-10-30 19:23 ` [patch 4/7] mm: cpuset aware reclaim writeout David Rientjes 2008-10-30 19:23 ` [patch 5/7] mm: throttle writeout with cpuset awareness David Rientjes 2008-10-30 19:23 ` [patch 6/7] cpusets: per cpuset dirty ratios David Rientjes 2008-10-30 19:23 ` [patch 7/7] cpusets: update documentation for writeback throttling David Rientjes 2008-10-30 21:08 ` [patch 0/7] cpuset " Dave Chinner 2008-10-30 21:33 ` Christoph Lameter 2008-10-30 22:03 ` Dave Chinner 2008-10-31 13:47 ` Christoph Lameter 2008-10-31 16:36 ` David Rientjes 2008-11-04 20:47 ` Andrew Morton 2008-11-04 20:53 ` Peter Zijlstra 2008-11-04 20:58 ` Christoph Lameter 2008-11-04 21:10 ` David Rientjes 2008-11-04 21:16 ` Andrew Morton 2008-11-04 21:21 ` Peter Zijlstra 2008-11-04 21:50 ` Andrew Morton 2008-11-04 22:17 ` Christoph Lameter 2008-11-04 22:35 ` Andrew Morton 2008-11-04 22:52 ` Christoph Lameter 2008-11-04 23:36 ` Andrew Morton 2008-11-05 1:31 ` KAMEZAWA Hiroyuki 2008-11-05 3:09 ` Andrew Morton 2008-11-05 2:45 ` Christoph Lameter 2008-11-05 3:05 ` Andrew Morton 2008-11-05 4:31 ` KAMEZAWA Hiroyuki 2008-11-10 9:02 ` Andrea Righi 2008-11-10 10:02 ` David Rientjes 2008-11-05 13:52 ` Christoph Lameter 2008-11-05 18:41 ` Andrew Morton 2008-11-05 20:21 ` Christoph Lameter 2008-11-05 20:31 ` Andrew Morton 2008-11-05 20:40 ` Christoph Lameter 2008-11-05 20:56 ` Andrew Morton 2008-11-05 21:28 ` Christoph Lameter 2008-11-05 21:55 ` Paul Menage 2008-11-05 22:04 ` David Rientjes 2008-11-06 1:34 ` KAMEZAWA Hiroyuki 2008-11-06 20:35 ` David Rientjes -- strict thread matches above, loose matches on Subject: below -- 2008-10-28 16:08 [patch 1/7] cpusets: add dirty map to struct address_space David Rientjes 2008-10-28 17:37 ` Peter Zijlstra 2008-10-28 20:48 ` David Rientjes 2008-10-29 1:13 ` David Rientjes 2008-10-29 2:24 ` David Rientjes 2008-10-30 8:38 ` Peter Zijlstra 2008-10-28 17:46 ` Peter Zijlstra 2008-10-28 19:19 ` David Rientjes
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20081104130918.bee16cff.akpm@linux-foundation.org \ --to=akpm@linux-foundation.org \ --cc=cl@linux-foundation.org \ --cc=dfults@sgi.com \ --cc=linux-kernel@vger.kernel.org \ --cc=menage@google.com \ --cc=npiggin@suse.de \ --cc=peterz@infradead.org \ --cc=rientjes@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).