LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Christoph Lameter <clameter@sgi.com>
To: akpm@osdl.org
Cc: Paul Menage <menage@google.com>,
	linux-kernel@vger.kernel.org,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	linux-mm@kvack.org, Andi Kleen <ak@suse.de>,
	Paul Jackson <pj@sgi.com>, Dave Chinner <dgc@sgi.com>,
	Christoph Lameter <clameter@sgi.com>
Subject: [RFC 2/8] Add a map to inodes to track dirty pages per node
Date: Mon, 15 Jan 2007 21:47:53 -0800 (PST)	[thread overview]
Message-ID: <20070116054753.15358.57869.sendpatchset@schroedinger.engr.sgi.com> (raw)
In-Reply-To: <20070116054743.15358.77287.sendpatchset@schroedinger.engr.sgi.com>

Add a dirty map to the inode

In a NUMA system it is helpful to know where the dirty pages of a mapping
are located. That way we will be able to implement writeout for applications
that are constrained to a portion of the memory of the system as required by
cpusets.

Two functions are introduced to manage the dirty node map:

cpuset_clear_dirty_nodes() and cpuset_update_nodes(). Both are defined using
macros since the definition of struct inode may not be available in cpuset.h.

The dirty map is cleared when the inode is cleared. There is no
synchronization (except for atomic nature of node_set) for the dirty_map. The
only problem that could be done is that we do not write out an inode if a
node bit is not set. That is rare and will be impossibly rare if multiple pages
are involved. There is therefore a slight chance that we have missed a dirty
node if it just contains a single page. Which is likely tolerable.

This patch increases the size of struct inode for the NUMA case. For
most arches that only support up to 64 nodes this is simply adding one
unsigned long.

However, the default Itanium configuration allows for up to 1024 nodes.
On Itanium we add 128 byte per inode. A later patch will make the size of
the per node bit array dynamic so that the size of the inode slab caches
is properly sized.

Signed-off-by; Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.20-rc5/fs/fs-writeback.c
===================================================================
--- linux-2.6.20-rc5.orig/fs/fs-writeback.c	2007-01-12 12:54:26.000000000 -0600
+++ linux-2.6.20-rc5/fs/fs-writeback.c	2007-01-15 22:34:12.065241639 -0600
@@ -22,6 +22,7 @@
 #include <linux/blkdev.h>
 #include <linux/backing-dev.h>
 #include <linux/buffer_head.h>
+#include <linux/cpuset.h>
 #include "internal.h"
 
 /**
@@ -223,11 +224,13 @@ __sync_single_inode(struct inode *inode,
 			/*
 			 * The inode is clean, inuse
 			 */
+			cpuset_clear_dirty_nodes(inode);
 			list_move(&inode->i_list, &inode_in_use);
 		} else {
 			/*
 			 * The inode is clean, unused
 			 */
+			cpuset_clear_dirty_nodes(inode);
 			list_move(&inode->i_list, &inode_unused);
 		}
 	}
Index: linux-2.6.20-rc5/fs/inode.c
===================================================================
--- linux-2.6.20-rc5.orig/fs/inode.c	2007-01-12 12:54:26.000000000 -0600
+++ linux-2.6.20-rc5/fs/inode.c	2007-01-15 22:33:55.802081773 -0600
@@ -22,6 +22,7 @@
 #include <linux/bootmem.h>
 #include <linux/inotify.h>
 #include <linux/mount.h>
+#include <linux/cpuset.h>
 
 /*
  * This is needed for the following functions:
@@ -134,6 +135,7 @@ static struct inode *alloc_inode(struct 
 		inode->i_cdev = NULL;
 		inode->i_rdev = 0;
 		inode->dirtied_when = 0;
+		cpuset_clear_dirty_nodes(inode);
 		if (security_inode_alloc(inode)) {
 			if (inode->i_sb->s_op->destroy_inode)
 				inode->i_sb->s_op->destroy_inode(inode);
Index: linux-2.6.20-rc5/include/linux/fs.h
===================================================================
--- linux-2.6.20-rc5.orig/include/linux/fs.h	2007-01-12 12:54:26.000000000 -0600
+++ linux-2.6.20-rc5/include/linux/fs.h	2007-01-15 22:33:55.876307100 -0600
@@ -589,6 +589,9 @@ struct inode {
 	void			*i_security;
 #endif
 	void			*i_private; /* fs or device private pointer */
+#ifdef CONFIG_CPUSETS
+	nodemask_t		dirty_nodes;	/* Map of nodes with dirty pages */
+#endif
 };
 
 /*
Index: linux-2.6.20-rc5/mm/page-writeback.c
===================================================================
--- linux-2.6.20-rc5.orig/mm/page-writeback.c	2007-01-12 12:54:26.000000000 -0600
+++ linux-2.6.20-rc5/mm/page-writeback.c	2007-01-15 22:34:14.425802376 -0600
@@ -33,6 +33,7 @@
 #include <linux/syscalls.h>
 #include <linux/buffer_head.h>
 #include <linux/pagevec.h>
+#include <linux/cpuset.h>
 
 /*
  * The maximum number of pages to writeout in a single bdflush/kupdate
@@ -780,6 +781,7 @@ int __set_page_dirty_nobuffers(struct pa
 		if (mapping->host) {
 			/* !PageAnon && !swapper_space */
 			__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
+			cpuset_update_dirty_nodes(mapping->host, page);
 		}
 		return 1;
 	}
Index: linux-2.6.20-rc5/fs/buffer.c
===================================================================
--- linux-2.6.20-rc5.orig/fs/buffer.c	2007-01-12 12:54:26.000000000 -0600
+++ linux-2.6.20-rc5/fs/buffer.c	2007-01-15 22:34:14.459008443 -0600
@@ -42,6 +42,7 @@
 #include <linux/bitops.h>
 #include <linux/mpage.h>
 #include <linux/bit_spinlock.h>
+#include <linux/cpuset.h>
 
 static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
 static void invalidate_bh_lrus(void);
@@ -739,6 +740,7 @@ int __set_page_dirty_buffers(struct page
 	}
 	write_unlock_irq(&mapping->tree_lock);
 	__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
+	cpuset_update_dirty_nodes(mapping->host, page);
 	return 1;
 }
 EXPORT_SYMBOL(__set_page_dirty_buffers);
Index: linux-2.6.20-rc5/include/linux/cpuset.h
===================================================================
--- linux-2.6.20-rc5.orig/include/linux/cpuset.h	2007-01-12 12:54:26.000000000 -0600
+++ linux-2.6.20-rc5/include/linux/cpuset.h	2007-01-15 22:34:37.757947994 -0600
@@ -75,6 +75,19 @@ static inline int cpuset_do_slab_mem_spr
 
 extern void cpuset_track_online_nodes(void);
 
+/*
+ * We need macros since struct inode is not defined yet
+ */
+#define cpuset_update_dirty_nodes(__inode, __page) \
+	node_set(page_to_nid(__page), (__inode)->dirty_nodes)
+
+#define cpuset_clear_dirty_nodes(__inode) \
+		(__inode)->dirty_nodes = NODE_MASK_NONE
+
+#define cpuset_intersects_dirty_nodes(__inode, __nodemask_ptr) \
+		(!(__nodemask_ptr) || nodes_intersects((__inode)->dirty_nodes, \
+		*(__nodemask_ptr)))
+
 #else /* !CONFIG_CPUSETS */
 
 static inline int cpuset_init_early(void) { return 0; }
@@ -146,6 +159,17 @@ static inline int cpuset_do_slab_mem_spr
 
 static inline void cpuset_track_online_nodes(void) {}
 
+static inline void cpuset_update_dirty_nodes(struct inode *i,
+		struct page *page) {}
+
+static inline void cpuset_clear_dirty_nodes(struct inode *i) {}
+
+static inline int cpuset_intersects_dirty_nodes(struct inode *i,
+		nodemask_t *n)
+{
+	return 1;
+}
+
 #endif /* !CONFIG_CPUSETS */
 
 #endif /* _LINUX_CPUSET_H */

  parent reply	other threads:[~2007-01-16  5:48 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-16  5:47 [RFC 0/8] Cpuset aware writeback Christoph Lameter
2007-01-16  5:47 ` [RFC 1/8] Convert higest_possible_node_id() into nr_node_ids Christoph Lameter
2007-01-16 22:05   ` Andi Kleen
2007-01-17  3:14     ` Christoph Lameter
2007-01-17  4:15       ` Andi Kleen
2007-01-17  4:23         ` Christoph Lameter
2007-01-16  5:47 ` Christoph Lameter [this message]
2007-01-16  5:47 ` [RFC 3/8] Add a nodemask to pdflush functions Christoph Lameter
2007-01-16  5:48 ` [RFC 4/8] Per cpuset dirty ratio handling and writeout Christoph Lameter
2007-01-16  5:48 ` [RFC 5/8] Make writeout during reclaim cpuset aware Christoph Lameter
2007-01-16 22:07   ` Andi Kleen
2007-01-17  4:20     ` Paul Jackson
2007-01-17  4:28       ` Andi Kleen
2007-01-17  4:36         ` Paul Jackson
2007-01-17  5:59           ` Andi Kleen
2007-01-17  6:19             ` Christoph Lameter
2007-01-17  4:23     ` Christoph Lameter
2007-01-16  5:48 ` [RFC 6/8] Throttle vm writeout per cpuset Christoph Lameter
2007-01-16  5:48 ` [RFC 7/8] Exclude unreclaimable pages from dirty ration calculation Christoph Lameter
2007-01-18 15:48   ` Nikita Danilov
2007-01-18 19:56     ` Christoph Lameter
2007-01-16  5:48 ` [RFC 8/8] Reduce inode memory usage for systems with a high MAX_NUMNODES Christoph Lameter
2007-01-16 19:52   ` Paul Menage
2007-01-16 20:00     ` Christoph Lameter
2007-01-16 20:06       ` Paul Menage
2007-01-16 20:51         ` Christoph Lameter
2007-01-16  7:38 ` [RFC 0/8] Cpuset aware writeback Peter Zijlstra
2007-01-16 20:10   ` Christoph Lameter
2007-01-16  9:25 ` Paul Jackson
2007-01-16 17:13   ` Christoph Lameter
2007-01-16 21:53 ` Andrew Morton
2007-01-16 22:08   ` [PATCH] nfs: fix congestion control Peter Zijlstra
2007-01-16 22:27     ` Trond Myklebust
2007-01-17  2:41       ` Peter Zijlstra
2007-01-17  6:15         ` Trond Myklebust
2007-01-17  8:49           ` Peter Zijlstra
2007-01-17 13:50             ` Trond Myklebust
2007-01-17 14:29               ` Peter Zijlstra
2007-01-17 14:45                 ` Trond Myklebust
2007-01-17 20:05     ` Christoph Lameter
2007-01-17 21:52       ` Peter Zijlstra
2007-01-17 21:54         ` Trond Myklebust
2007-01-18 13:27           ` Peter Zijlstra
2007-01-18 15:49             ` Trond Myklebust
2007-01-19  9:33               ` Peter Zijlstra
2007-01-19 13:07                 ` Peter Zijlstra
2007-01-19 16:51                   ` Trond Myklebust
2007-01-19 17:54                     ` Peter Zijlstra
2007-01-19 17:20                   ` Christoph Lameter
2007-01-19 17:57                     ` Peter Zijlstra
2007-01-19 18:02                       ` Christoph Lameter
2007-01-19 18:26                       ` Trond Myklebust
2007-01-19 18:27                         ` Christoph Lameter
2007-01-20  7:01                         ` [PATCH] nfs: fix congestion control -v3 Peter Zijlstra
2007-01-22 16:12                           ` Trond Myklebust
2007-01-25 15:32                             ` [PATCH] nfs: fix congestion control -v4 Peter Zijlstra
2007-01-26  5:02                               ` Andrew Morton
2007-01-26  8:00                                 ` Peter Zijlstra
2007-01-26  8:50                                   ` Peter Zijlstra
2007-01-26  5:09                               ` Andrew Morton
2007-01-26  5:31                                 ` Christoph Lameter
2007-01-26  6:04                                   ` Andrew Morton
2007-01-26  6:53                                     ` Christoph Lameter
2007-01-26  8:03                                     ` Peter Zijlstra
2007-01-26  8:51                                       ` Andrew Morton
2007-01-26  9:01                                         ` Peter Zijlstra
2007-02-20 12:59                                         ` Peter Zijlstra
2007-01-22 17:59                           ` [PATCH] nfs: fix congestion control -v3 Christoph Lameter
2007-01-17 23:15     ` [PATCH] nfs: fix congestion control Christoph Hellwig
2007-01-16 22:15   ` [RFC 0/8] Cpuset aware writeback Christoph Lameter
2007-01-16 23:40     ` Andrew Morton
2007-01-17  0:16       ` Christoph Lameter
2007-01-17  1:07         ` Andrew Morton
2007-01-17  1:30           ` Christoph Lameter
2007-01-17  2:34             ` Andrew Morton
2007-01-17  3:40               ` Christoph Lameter
2007-01-17  4:02                 ` Paul Jackson
2007-01-17  4:05                 ` Andrew Morton
2007-01-17  6:27                   ` Christoph Lameter
2007-01-17  7:00                     ` Andrew Morton
2007-01-17  8:01                       ` Paul Jackson
2007-01-17  9:57                         ` Andrew Morton
2007-01-17 19:43                       ` Christoph Lameter
2007-01-17 22:10                         ` Andrew Morton
2007-01-18  1:10                           ` Christoph Lameter
2007-01-18  1:25                             ` Andrew Morton
2007-01-18  5:21                               ` Christoph Lameter
2007-01-16 23:44   ` David Chinner
2007-01-16 22:01 ` Andi Kleen
2007-01-16 22:18   ` Christoph Lameter
2007-02-02  1:38 ` Ethan Solomita
2007-02-02  2:16   ` Christoph Lameter
2007-02-02  4:03     ` Andrew Morton
2007-02-02  5:29       ` Christoph Lameter
2007-02-02  6:02         ` Neil Brown
2007-02-02  6:17           ` Christoph Lameter
2007-02-02  6:41             ` Neil Brown
2007-02-02  7:12         ` Andrew Morton
2007-03-21 21:11     ` Ethan Solomita
2007-03-21 21:29       ` Christoph Lameter
2007-03-21 21:52         ` Andrew Morton
2007-03-21 21:57           ` Christoph Lameter
2007-04-19  2:07         ` Ethan Solomita
2007-04-19  2:55           ` Christoph Lameter
2007-04-19  7:52             ` Ethan Solomita
2007-04-19 16:03               ` Christoph Lameter
2007-04-21  1:37             ` Ethan Solomita
2007-04-21  1:48               ` Christoph Lameter
2007-04-21  8:15                 ` Ethan Solomita
2007-04-21 15:40                   ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070116054753.15358.57869.sendpatchset@schroedinger.engr.sgi.com \
    --to=clameter@sgi.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=dgc@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=menage@google.com \
    --cc=nickpiggin@yahoo.com.au \
    --cc=pj@sgi.com \
    --subject='Re: [RFC 2/8] Add a map to inodes to track dirty pages per node' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).