LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 00/13] writeback bug fixes and simplifications take 2
       [not found] <400401292.24795@ustc.edu.cn>
@ 2008-01-15 12:36 ` Fengguang Wu
       [not found]   ` <400401290.18034@ustc.edu.cn>
                     ` (12 more replies)
  2008-01-15 18:33 ` [PATCH 00/13] writeback bug fixes and simplifications take 2 Michael Rubin
  2008-01-18  7:51 ` Michael Rubin
  2 siblings, 13 replies; 20+ messages in thread
From: Fengguang Wu @ 2008-01-15 12:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michael Rubin, Peter Zijlstra, linux-fsdevel, linux-kernel

Andrew,

This patchset mainly polishes the writeback queuing policies.
The main goals are:

(1) small files should not be starved by big dirty files
(2) sync as fast as possible for not-blocked inodes/pages
    - don't leave them out; no congestion_wait() in between them
(3) avoid busy iowait for blocked inodes
    - retry them in the next go of s_io(maybe at the next wakeup of pdflush)

The role of the queues:

s_dirty:   park for dirtied_when expiration
s_io:      park for io submission
s_more_io: for big dirty inodes, they will be retried in this run of pdflush
           (it ensures fairness between small/large files)
s_more_io_wait: for blocked inodes, they will be picked up in next run of s_io


This patchset is in better shape, but still not ready for merge.
It begins with:

	[PATCH 01/13] writeback: revert 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b      
	[PATCH 02/13] writeback: clear PAGECACHE_TAG_DIRTY for truncated page in block_write_full_page()

Introduces more_io/more_io_wait based policies:

	[PATCH 03/13] writeback: introduce writeback_control.more_io                  
	[PATCH 04/13] writeback: introduce super_block.s_more_io_wait                 
	[PATCH 05/13] writeback: merge duplicate code into writeback_some_pages()     
	[PATCH 06/13] writeback: defer writeback on not-all-pages-written             
	[PATCH 07/13] writeback: defer writeback on locked inode                      
	[PATCH 08/13] writeback: defer writeback on locked buffers                    
	[PATCH 09/13] writeback: requeue_io() on redirtied inode                      

And finishes with some code cleanups:

	[PATCH 10/13] writeback: introduce queue_dirty()                              
	[PATCH 11/13] writeback: queue_dirty() on memory-backed bdi                   
	[PATCH 12/13] writeback: remove redirty_tail()                                
	[PATCH 13/13] writeback: cleanup __sync_single_inode()                        

Diffstat:

 fs/buffer.c         |    2 
 fs/fs-writeback.c   |  121 +++++++++++++++---------------------------
 fs/super.c          |    1 
 include/linux/fs.h  |    1 
 mm/page-writeback.c |   46 +++++++--------
 5 files changed, 72 insertions(+), 99 deletions(-)

Regards,
Fengguang Wu
-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 01/13] writeback: revert 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b
       [not found]   ` <400401290.18034@ustc.edu.cn>
@ 2008-01-15 12:36     ` Fengguang Wu
  0 siblings, 0 replies; 20+ messages in thread
From: Fengguang Wu @ 2008-01-15 12:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michael Rubin, Peter Zijlstra, linux-fsdevel, linux-kernel

[-- Attachment #1: revert-more_io.patch --]
[-- Type: text/plain, Size: 2545 bytes --]

Revert 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b.

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
fs/fs-writeback.c         |    2 --
 include/linux/writeback.h |    1 -
 mm/page-writeback.c       |    9 +++------
 3 files changed, 3 insertions(+), 9 deletions(-)

Index: linux-mm/include/linux/writeback.h
===================================================================
--- linux-mm.orig/include/linux/writeback.h
+++ linux-mm/include/linux/writeback.h
@@ -62,7 +62,6 @@ struct writeback_control {
 	unsigned for_reclaim:1;		/* Invoked from the page allocator */
 	unsigned for_writepages:1;	/* This is a writepages() call */
 	unsigned range_cyclic:1;	/* range_start is cyclic */
-	unsigned more_io:1;		/* more io to be dispatched */
 };
 
 /*
Index: linux-mm/fs/fs-writeback.c
===================================================================
--- linux-mm.orig/fs/fs-writeback.c
+++ linux-mm/fs/fs-writeback.c
@@ -475,8 +475,6 @@ int generic_sync_sb_inodes(struct super_
 		if (wbc->nr_to_write <= 0)
 			break;
 	}
-	if (!list_empty(&sb->s_more_io))
-		wbc->more_io = 1;
 	spin_unlock(&inode_lock);
 	return ret;		/* Leave any unwritten inodes on s_io */
 }
Index: linux-mm/mm/page-writeback.c
===================================================================
--- linux-mm.orig/mm/page-writeback.c
+++ linux-mm/mm/page-writeback.c
@@ -567,7 +567,6 @@ static void background_writeout(unsigned
 			global_page_state(NR_UNSTABLE_NFS) < background_thresh
 				&& min_pages <= 0)
 			break;
-		wbc.more_io = 0;
 		wbc.encountered_congestion = 0;
 		wbc.nr_to_write = MAX_WRITEBACK_PAGES;
 		wbc.pages_skipped = 0;
@@ -575,9 +574,8 @@ static void background_writeout(unsigned
 		min_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
 		if (wbc.nr_to_write > 0 || wbc.pages_skipped > 0) {
 			/* Wrote less than expected */
-			if (wbc.encountered_congestion || wbc.more_io)
-				congestion_wait(WRITE, HZ/10);
-			else
+			congestion_wait(WRITE, HZ/10);
+			if (!wbc.encountered_congestion)
 				break;
 		}
 	}
@@ -642,12 +640,11 @@ static void wb_kupdate(unsigned long arg
 			global_page_state(NR_UNSTABLE_NFS) +
 			(inodes_stat.nr_inodes - inodes_stat.nr_unused);
 	while (nr_to_write > 0) {
-		wbc.more_io = 0;
 		wbc.encountered_congestion = 0;
 		wbc.nr_to_write = MAX_WRITEBACK_PAGES;
 		writeback_inodes(&wbc);
 		if (wbc.nr_to_write > 0) {
-			if (wbc.encountered_congestion || wbc.more_io)
+			if (wbc.encountered_congestion)
 				congestion_wait(WRITE, HZ/10);
 			else
 				break;	/* All the old data is written */

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 02/13] writeback: clear PAGECACHE_TAG_DIRTY for truncated page in block_write_full_page()
       [not found]   ` <400401290.17576@ustc.edu.cn>
@ 2008-01-15 12:36     ` Fengguang Wu
  0 siblings, 0 replies; 20+ messages in thread
From: Fengguang Wu @ 2008-01-15 12:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michael Rubin, Peter Zijlstra, linux-fsdevel, linux-kernel

[-- Attachment #1: ext2-fix.patch --]
[-- Type: text/plain, Size: 847 bytes --]

The `truncated' page in block_write_full_page() may stick for a long time.
E.g. ext2_rmdir() will set i_size to 0, and then the dir inode may hang around
because of being referenced by someone.

So clear PAGECACHE_TAG_DIRTY to prevent pdflush from retrying and iowaiting on
it.

Tested-by: Joerg Platte <jplatte@naasa.net>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
fs/buffer.c |    2 ++
 1 files changed, 2 insertions(+)

Index: linux/fs/buffer.c
===================================================================
--- linux.orig/fs/buffer.c
+++ linux/fs/buffer.c
@@ -2820,7 +2820,9 @@ int block_write_full_page(struct page *p
 		 * freeable here, so the page does not leak.
 		 */
 		do_invalidatepage(page, 0);
+		set_page_writeback(page);
 		unlock_page(page);
+		end_page_writeback(page);
 		return 0; /* don't care */
 	}
 

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 03/13] writeback: introduce writeback_control.more_io
       [not found]   ` <400401291.20012@ustc.edu.cn>
@ 2008-01-15 12:36     ` Fengguang Wu
  0 siblings, 0 replies; 20+ messages in thread
From: Fengguang Wu @ 2008-01-15 12:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michael Rubin, Peter Zijlstra, linux-fsdevel, linux-kernel

[-- Attachment #1: writeback-more_io.patch --]
[-- Type: text/plain, Size: 1476 bytes --]

Introduce writeback_control.more_io to indicate that more I/O is
scheduled for this wakeup of pdflush.

Note that more_io is only updated on the _visited_ superblocks,
which prevents pdflush deamons from interfering with one another.

Cc: Michael Rubin <mrubin@google.com>                                                                                   
Cc: Peter Zijlstra <peterz@infradead.org>                                                                               
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 fs/fs-writeback.c         |    6 +++++-
 include/linux/writeback.h |    1 +
 2 files changed, 6 insertions(+), 1 deletion(-)

--- linux-mm.orig/fs/fs-writeback.c
+++ linux-mm/fs/fs-writeback.c
@@ -472,8 +472,12 @@ int generic_sync_sb_inodes(struct super_
 		iput(inode);
 		cond_resched();
 		spin_lock(&inode_lock);
-		if (wbc->nr_to_write <= 0)
+		if (wbc->nr_to_write <= 0) {
+			wbc->more_io = 1;
 			break;
+		}
+		if (!list_empty(&sb->s_more_io))
+			wbc->more_io = 1;
 	}
 	spin_unlock(&inode_lock);
 	return ret;		/* Leave any unwritten inodes on s_io */
--- linux-mm.orig/include/linux/writeback.h
+++ linux-mm/include/linux/writeback.h
@@ -62,6 +62,7 @@ struct writeback_control {
 	unsigned for_reclaim:1;		/* Invoked from the page allocator */
 	unsigned for_writepages:1;	/* This is a writepages() call */
 	unsigned range_cyclic:1;	/* range_start is cyclic */
+	unsigned more_io:1;		/* more io to be dispatched */
 };
 
 /*

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 04/13] writeback: introduce super_block.s_more_io_wait
       [not found]   ` <400401291.24198@ustc.edu.cn>
@ 2008-01-15 12:36     ` Fengguang Wu
  0 siblings, 0 replies; 20+ messages in thread
From: Fengguang Wu @ 2008-01-15 12:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michael Rubin, Peter Zijlstra, linux-fsdevel, linux-kernel

[-- Attachment #1: writeback-more_io_wait.patch --]
[-- Type: text/plain, Size: 3101 bytes --]

Introduce super_block.s_more_io_wait to park inodes that for some reason cannot
be synced immediately. They will be revisited in the next s_io enqueue time(<=5s).

The new data flow after this patchset:

s_dirty --> s_io --> s_more_io/s_more_io_wait --+
             ^                                  |
	     |                                  |
	     +----------------------------------+

- to fill s_io:
		s_more_io +
		s_dirty(expired) +
		s_more_io_wait
				---> s_io
- to drain s_io:
		s_io -+--> clean inodes goto inode_in_use/inode_unused
		      |
		      +--> s_more_io
		      |
		      +--> s_more_io_wait

Obviously there're no ordering or starvation problems in the queues:
- s_dirty is now a strict FIFO queue
- inode.dirtied_when is only set when made dirty
- once exipired, the dirty inode will stay in s_*io* queues until made clean
- the dirty inodes in s_*io* will be revisted in order, hence small files won't
  be starved by big dirty files.

Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 fs/fs-writeback.c  |   16 +++++++++++++---
 fs/super.c         |    1 +
 include/linux/fs.h |    1 +
 3 files changed, 15 insertions(+), 3 deletions(-)

--- linux-mm.orig/fs/fs-writeback.c
+++ linux-mm/fs/fs-writeback.c
@@ -172,6 +172,14 @@ static void requeue_io(struct inode *ino
 	list_move(&inode->i_list, &inode->i_sb->s_more_io);
 }
 
+/*
+ * The inode should be retried after _sleeping_ for a while.
+ */
+static void requeue_io_wait(struct inode *inode)
+{
+	list_move(&inode->i_list, &inode->i_sb->s_more_io_wait);
+}
+
 static void inode_sync_complete(struct inode *inode)
 {
 	/*
@@ -206,13 +214,15 @@ static void queue_io(struct super_block 
 {
 	list_splice_init(&sb->s_more_io, sb->s_io.prev);
 	move_expired_inodes(&sb->s_dirty, &sb->s_io, older_than_this);
+	list_splice_init(&sb->s_more_io_wait, sb->s_io.prev);
 }
 
 int sb_has_dirty_inodes(struct super_block *sb)
 {
-	return !list_empty(&sb->s_dirty) ||
-	       !list_empty(&sb->s_io) ||
-	       !list_empty(&sb->s_more_io);
+	return !list_empty(&sb->s_dirty)   ||
+	       !list_empty(&sb->s_io)      ||
+	       !list_empty(&sb->s_more_io) ||
+	       !list_empty(&sb->s_more_io_wait);
 }
 EXPORT_SYMBOL(sb_has_dirty_inodes);
 
--- linux-mm.orig/fs/super.c
+++ linux-mm/fs/super.c
@@ -64,6 +64,7 @@ static struct super_block *alloc_super(s
 		INIT_LIST_HEAD(&s->s_dirty);
 		INIT_LIST_HEAD(&s->s_io);
 		INIT_LIST_HEAD(&s->s_more_io);
+		INIT_LIST_HEAD(&s->s_more_io_wait);
 		INIT_LIST_HEAD(&s->s_files);
 		INIT_LIST_HEAD(&s->s_instances);
 		INIT_HLIST_HEAD(&s->s_anon);
--- linux-mm.orig/include/linux/fs.h
+++ linux-mm/include/linux/fs.h
@@ -1011,6 +1011,7 @@ struct super_block {
 	struct list_head	s_dirty;	/* dirty inodes */
 	struct list_head	s_io;		/* parked for writeback */
 	struct list_head	s_more_io;	/* parked for more writeback */
+	struct list_head	s_more_io_wait;	/* parked for sleep-then-retry */
 	struct hlist_head	s_anon;		/* anonymous dentries for (nfs) exporting */
 	struct list_head	s_files;
 

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 05/13] writeback: merge duplicate code into writeback_some_pages()
       [not found]   ` <400401292.20625@ustc.edu.cn>
@ 2008-01-15 12:36     ` Fengguang Wu
  0 siblings, 0 replies; 20+ messages in thread
From: Fengguang Wu @ 2008-01-15 12:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michael Rubin, Peter Zijlstra, linux-fsdevel, linux-kernel

[-- Attachment #1: writeback-writeback_some_pages.patch --]
[-- Type: text/plain, Size: 2860 bytes --]

Merge duplicate code from background_writeout() and wb_kupdate() into
writeback_some_pages().

The pages_skipped in background_writeout() is ignored.  The inode cannot be
written now will be retried in the next run of pdflush, typically in 5s.

Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 mm/page-writeback.c |   43 +++++++++++++++++++++---------------------
 1 files changed, 22 insertions(+), 21 deletions(-)

--- linux-mm.orig/mm/page-writeback.c
+++ linux-mm/mm/page-writeback.c
@@ -543,6 +543,24 @@ void throttle_vm_writeout(gfp_t gfp_mask
 }
 
 /*
+ * writeback up to @nr dirty pages.
+ * return true if there's more work.
+ */
+static int writeback_some_pages(struct writeback_control *wbc, int nr)
+{
+	wbc->more_io = 0;
+	wbc->encountered_congestion = 0;
+	wbc->nr_to_write = nr;
+
+	writeback_inodes(wbc);
+
+	if (wbc->encountered_congestion)
+		congestion_wait(WRITE, HZ/10);
+
+	return	wbc->more_io || wbc->encountered_congestion;
+}
+
+/*
  * writeback at least _min_pages, and keep writing until the amount of dirty
  * memory is less than the background threshold, or until we're all clean.
  */
@@ -553,7 +571,6 @@ static void background_writeout(unsigned
 		.bdi		= NULL,
 		.sync_mode	= WB_SYNC_NONE,
 		.older_than_this = NULL,
-		.nr_to_write	= 0,
 		.nonblocking	= 1,
 		.range_cyclic	= 1,
 	};
@@ -567,17 +584,9 @@ static void background_writeout(unsigned
 			global_page_state(NR_UNSTABLE_NFS) < background_thresh
 				&& min_pages <= 0)
 			break;
-		wbc.encountered_congestion = 0;
-		wbc.nr_to_write = MAX_WRITEBACK_PAGES;
-		wbc.pages_skipped = 0;
-		writeback_inodes(&wbc);
+		if (!writeback_some_pages(&wbc, MAX_WRITEBACK_PAGES))
+			break;
 		min_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
-		if (wbc.nr_to_write > 0 || wbc.pages_skipped > 0) {
-			/* Wrote less than expected */
-			congestion_wait(WRITE, HZ/10);
-			if (!wbc.encountered_congestion)
-				break;
-		}
 	}
 }
 
@@ -625,7 +634,6 @@ static void wb_kupdate(unsigned long arg
 		.bdi		= NULL,
 		.sync_mode	= WB_SYNC_NONE,
 		.older_than_this = &oldest_jif,
-		.nr_to_write	= 0,
 		.nonblocking	= 1,
 		.for_kupdate	= 1,
 		.range_cyclic	= 1,
@@ -640,15 +648,8 @@ static void wb_kupdate(unsigned long arg
 			global_page_state(NR_UNSTABLE_NFS) +
 			(inodes_stat.nr_inodes - inodes_stat.nr_unused);
 	while (nr_to_write > 0) {
-		wbc.encountered_congestion = 0;
-		wbc.nr_to_write = MAX_WRITEBACK_PAGES;
-		writeback_inodes(&wbc);
-		if (wbc.nr_to_write > 0) {
-			if (wbc.encountered_congestion)
-				congestion_wait(WRITE, HZ/10);
-			else
-				break;	/* All the old data is written */
-		}
+		if (!writeback_some_pages(&wbc, MAX_WRITEBACK_PAGES))
+			break;
 		nr_to_write -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
 	}
 	if (time_before(next_jif, jiffies + HZ))

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 06/13] writeback: defer writeback on not-all-pages-written
       [not found]   ` <400401292.17900@ustc.edu.cn>
@ 2008-01-15 12:36     ` Fengguang Wu
  0 siblings, 0 replies; 20+ messages in thread
From: Fengguang Wu @ 2008-01-15 12:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michael Rubin, Peter Zijlstra, linux-fsdevel, linux-kernel

[-- Attachment #1: writeback-more_io_wait-a.patch --]
[-- Type: text/plain, Size: 2451 bytes --]

Convert to requeue_io_wait() for case:

	- kupdate cannot write all pages due to some blocking condition;
	- during sync, a file is being written to too fast, starving other
	  files.

In the case of sync, requeue_io_wait() can break the starvation because the
inode requeued into s_more_io_wait will be served _after_ normal inodes, hence
won't stand in the way of other inodes in the next run.

Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 fs/fs-writeback.c |   33 ++++++++-------------------------
 1 files changed, 8 insertions(+), 25 deletions(-)

--- linux-mm.orig/fs/fs-writeback.c
+++ linux-mm/fs/fs-writeback.c
@@ -275,37 +275,20 @@ __sync_single_inode(struct inode *inode,
 		    mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
 			/*
 			 * We didn't write back all the pages.  nfs_writepages()
-			 * sometimes bales out without doing anything. Redirty
-			 * the inode; Move it from s_io onto s_more_io/s_dirty.
+			 * sometimes bales out without doing anything.
 			 */
-			/*
-			 * akpm: if the caller was the kupdate function we put
-			 * this inode at the head of s_dirty so it gets first
-			 * consideration.  Otherwise, move it to the tail, for
-			 * the reasons described there.  I'm not really sure
-			 * how much sense this makes.  Presumably I had a good
-			 * reasons for doing it this way, and I'd rather not
-			 * muck with it at present.
-			 */
-			if (wbc->for_kupdate) {
+			inode->i_state |= I_DIRTY_PAGES;
+			if (wbc->for_kupdate && wbc->nr_to_write <= 0)
 				/*
-				 * For the kupdate function we move the inode
-				 * to s_more_io so it will get more writeout as
-				 * soon as the queue becomes uncongested.
+				 * slice used up: queue for next turn
 				 */
-				inode->i_state |= I_DIRTY_PAGES;
 				requeue_io(inode);
-			} else {
+			else
 				/*
-				 * Otherwise fully redirty the inode so that
-				 * other inodes on this superblock will get some
-				 * writeout.  Otherwise heavy writing to one
-				 * file would indefinitely suspend writeout of
-				 * all the other files.
+				 * 1) somehow blocked in kupdate: retry later
+				 * 2) fast writer during sync: give others a try
 				 */
-				inode->i_state |= I_DIRTY_PAGES;
-				redirty_tail(inode);
-			}
+				requeue_io_wait(inode);
 		} else if (inode->i_state & I_DIRTY) {
 			/*
 			 * Someone redirtied the inode while were writing back

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 07/13] writeback: defer writeback on locked inode
       [not found]   ` <400401293.20627@ustc.edu.cn>
@ 2008-01-15 12:36     ` Fengguang Wu
  0 siblings, 0 replies; 20+ messages in thread
From: Fengguang Wu @ 2008-01-15 12:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michael Rubin, Peter Zijlstra, linux-fsdevel, linux-kernel

[-- Attachment #1: writeback-more_io_wait-b.patch --]
[-- Type: text/plain, Size: 930 bytes --]

Convert to requeue_io_wait() for case:

	inode is locked.

Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 fs/fs-writeback.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

--- linux-mm.orig/fs/fs-writeback.c
+++ linux-mm/fs/fs-writeback.c
@@ -329,12 +329,9 @@ __writeback_single_inode(struct inode *i
 	if ((wbc->sync_mode != WB_SYNC_ALL) && (inode->i_state & I_SYNC)) {
 		/*
 		 * We're skipping this inode because it's locked, and we're not
-		 * doing writeback-for-data-integrity.  Move it to s_more_io so
-		 * that writeback can proceed with the other inodes on s_io.
-		 * We'll have another go at writing back this inode when we
-		 * completed a full scan of s_io.
+		 * doing writeback-for-data-integrity. Recheck it after a while.
 		 */
-		requeue_io(inode);
+		requeue_io_wait(inode);
 		return 0;
 	}
 

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 08/13] writeback: defer writeback on locked buffers
       [not found]   ` <400401293.21086@ustc.edu.cn>
@ 2008-01-15 12:36     ` Fengguang Wu
  0 siblings, 0 replies; 20+ messages in thread
From: Fengguang Wu @ 2008-01-15 12:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michael Rubin, Peter Zijlstra, linux-fsdevel, linux-kernel

[-- Attachment #1: writeback-more_io_wait-d.patch --]
[-- Type: text/plain, Size: 632 bytes --]

Convert to requeue_io_wait() for case:

	pages skipped due to locked buffers.

Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 fs/fs-writeback.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

--- linux-mm.orig/fs/fs-writeback.c
+++ linux-mm/fs/fs-writeback.c
@@ -456,7 +456,7 @@ int generic_sync_sb_inodes(struct super_
 			 * writeback is not making progress due to locked
 			 * buffers.  Skip this inode for now.
 			 */
-			redirty_tail(inode);
+			requeue_io_wait(inode);
 		}
 		spin_unlock(&inode_lock);
 		iput(inode);

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 09/13] writeback: requeue_io() on redirtied inode
       [not found]   ` <400401295.20625@ustc.edu.cn>
@ 2008-01-15 12:36     ` Fengguang Wu
  2008-01-16  8:13       ` David Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Fengguang Wu @ 2008-01-15 12:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michael Rubin, Peter Zijlstra, linux-fsdevel, linux-kernel

[-- Attachment #1: writeback-remove-redirty-b.patch --]
[-- Type: text/plain, Size: 769 bytes --]

Redirtied inodes could be seen in really fast writes.
They should really be synced as soon as possible.

redirty_tail() could delay the inode for up to 30s.
Kill the delay by using requeue_io() instead.

Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 fs/fs-writeback.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

--- linux-mm.orig/fs/fs-writeback.c
+++ linux-mm/fs/fs-writeback.c
@@ -294,7 +294,7 @@ __sync_single_inode(struct inode *inode,
 			 * Someone redirtied the inode while were writing back
 			 * the pages.
 			 */
-			redirty_tail(inode);
+			requeue_io(inode);
 		} else if (atomic_read(&inode->i_count)) {
 			/*
 			 * The inode is clean, inuse

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 10/13] writeback: introduce queue_dirty()
       [not found]   ` <400401294.29514@ustc.edu.cn>
@ 2008-01-15 12:36     ` Fengguang Wu
  0 siblings, 0 replies; 20+ messages in thread
From: Fengguang Wu @ 2008-01-15 12:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michael Rubin, Peter Zijlstra, linux-fsdevel, linux-kernel

[-- Attachment #1: writeback-queue_dirty.patch --]
[-- Type: text/plain, Size: 1542 bytes --]

Introduce queue_dirty() to enqueue a newly dirtied inode.
It helps remove duplicate code.

Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 fs/fs-writeback.c |   21 +++++++++++++--------
 1 files changed, 13 insertions(+), 8 deletions(-)

--- linux-mm.orig/fs/fs-writeback.c
+++ linux-mm/fs/fs-writeback.c
@@ -25,6 +25,15 @@
 #include <linux/buffer_head.h>
 #include "internal.h"
 
+/*
+ * Enqueue a newly dirtied inode.
+ */
+static void queue_dirty(struct inode *inode)
+{
+	inode->dirtied_when = jiffies;
+	list_move(&inode->i_list, &inode->i_sb->s_dirty);
+}
+
 /**
  *	__mark_inode_dirty -	internal function
  *	@inode: inode to mark
@@ -122,10 +131,8 @@ void __mark_inode_dirty(struct inode *in
 		 * If the inode was already on s_dirty/s_io/s_more_io, don't
 		 * reposition it (that would break s_dirty time-ordering).
 		 */
-		if (!was_dirty) {
-			inode->dirtied_when = jiffies;
-			list_move(&inode->i_list, &sb->s_dirty);
-		}
+		if (!was_dirty)
+			queue_dirty(inode);
 	}
 out:
 	spin_unlock(&inode_lock);
@@ -445,10 +452,8 @@ int generic_sync_sb_inodes(struct super_
 		err = __writeback_single_inode(inode, wbc);
 		if (!ret)
 			ret = err;
-		if (wbc->sync_mode == WB_SYNC_HOLD) {
-			inode->dirtied_when = jiffies;
-			list_move(&inode->i_list, &sb->s_dirty);
-		}
+		if (wbc->sync_mode == WB_SYNC_HOLD)
+			queue_dirty(inode);
 		if (current_is_pdflush())
 			writeback_release(bdi);
 		if (wbc->pages_skipped != pages_skipped) {

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 11/13] writeback: queue_dirty() on memory-backed bdi
       [not found]   ` <400401293.44721@ustc.edu.cn>
@ 2008-01-15 12:36     ` Fengguang Wu
  0 siblings, 0 replies; 20+ messages in thread
From: Fengguang Wu @ 2008-01-15 12:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michael Rubin, Peter Zijlstra, linux-fsdevel, linux-kernel

[-- Attachment #1: writeback-remove-redirty-a.patch --]
[-- Type: text/plain, Size: 642 bytes --]

Replace redirty_tail() with queue_dirty() on memory backed bdi.
It makes no difference - only simpler.

Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 fs/fs-writeback.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

--- linux-mm.orig/fs/fs-writeback.c
+++ linux-mm/fs/fs-writeback.c
@@ -407,7 +407,7 @@ int generic_sync_sb_inodes(struct super_
 		int err;
 
 		if (!bdi_cap_writeback_dirty(bdi)) {
-			redirty_tail(inode);
+			queue_dirty(inode);
 			if (sb_is_blkdev_sb(sb)) {
 				/*
 				 * Dirty memory-backed blockdev: the ramdisk

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 12/13] writeback: remove redirty_tail()
       [not found]   ` <400401294.78786@ustc.edu.cn>
@ 2008-01-15 12:36     ` Fengguang Wu
  0 siblings, 0 replies; 20+ messages in thread
From: Fengguang Wu @ 2008-01-15 12:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michael Rubin, Peter Zijlstra, linux-fsdevel, linux-kernel

[-- Attachment #1: writeback-remove-redirty_tail.patch --]
[-- Type: text/plain, Size: 1342 bytes --]

Remove redirty_tail(). It's no longer used.

Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 fs/fs-writeback.c |   24 ------------------------
 1 files changed, 24 deletions(-)

--- linux-mm.orig/fs/fs-writeback.c
+++ linux-mm/fs/fs-writeback.c
@@ -148,30 +148,6 @@ static int write_inode(struct inode *ino
 }
 
 /*
- * Redirty an inode: set its when-it-was dirtied timestamp and move it to the
- * furthest end of its superblock's dirty-inode list.
- *
- * Before stamping the inode's ->dirtied_when, we check to see whether it is
- * already the most-recently-dirtied inode on the s_dirty list.  If that is
- * the case then the inode must have been redirtied while it was being written
- * out and we don't reset its dirtied_when.
- */
-static void redirty_tail(struct inode *inode)
-{
-	struct super_block *sb = inode->i_sb;
-
-	if (!list_empty(&sb->s_dirty)) {
-		struct inode *tail_inode;
-
-		tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list);
-		if (!time_after_eq(inode->dirtied_when,
-				tail_inode->dirtied_when))
-			inode->dirtied_when = jiffies;
-	}
-	list_move(&inode->i_list, &sb->s_dirty);
-}
-
-/*
  * requeue inode for re-scanning after sb->s_io list is exhausted.
  */
 static void requeue_io(struct inode *inode)

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 13/13] writeback: cleanup __sync_single_inode()
       [not found]   ` <400401294.20012@ustc.edu.cn>
@ 2008-01-15 12:36     ` Fengguang Wu
  0 siblings, 0 replies; 20+ messages in thread
From: Fengguang Wu @ 2008-01-15 12:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michael Rubin, Peter Zijlstra, linux-fsdevel, linux-kernel

[-- Attachment #1: writeback-simplify-x.patch --]
[-- Type: text/plain, Size: 1350 bytes --]

Make the if-else straight in __sync_single_inode().
No behavior change.

Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 fs/fs-writeback.c |   15 +++++++--------
 1 files changed, 7 insertions(+), 8 deletions(-)

--- linux-mm.orig/fs/fs-writeback.c
+++ linux-mm/fs/fs-writeback.c
@@ -254,8 +254,13 @@ __sync_single_inode(struct inode *inode,
 	spin_lock(&inode_lock);
 	inode->i_state &= ~I_SYNC;
 	if (!(inode->i_state & I_FREEING)) {
-		if (!(inode->i_state & I_DIRTY) &&
-		    mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
+		if (inode->i_state & I_DIRTY) {
+			/*
+			 * Someone redirtied the inode while were writing back
+			 * the pages.
+			 */
+			requeue_io(inode);
+		} else if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
 			/*
 			 * We didn't write back all the pages.  nfs_writepages()
 			 * sometimes bales out without doing anything.
@@ -272,12 +277,6 @@ __sync_single_inode(struct inode *inode,
 				 * 2) fast writer during sync: give others a try
 				 */
 				requeue_io_wait(inode);
-		} else if (inode->i_state & I_DIRTY) {
-			/*
-			 * Someone redirtied the inode while were writing back
-			 * the pages.
-			 */
-			requeue_io(inode);
 		} else if (atomic_read(&inode->i_count)) {
 			/*
 			 * The inode is clean, inuse

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 00/13] writeback bug fixes and simplifications take 2
       [not found] <400401292.24795@ustc.edu.cn>
  2008-01-15 12:36 ` [PATCH 00/13] writeback bug fixes and simplifications take 2 Fengguang Wu
@ 2008-01-15 18:33 ` Michael Rubin
       [not found]   ` <400449941.28506@ustc.edu.cn>
  2008-01-18  7:51 ` Michael Rubin
  2 siblings, 1 reply; 20+ messages in thread
From: Michael Rubin @ 2008-01-15 18:33 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: Andrew Morton, Peter Zijlstra, linux-fsdevel, linux-kernel

On Jan 15, 2008 4:36 AM, Fengguang Wu <wfg@mail.ustc.edu.cn> wrote:
> Andrew,
>
> This patchset mainly polishes the writeback queuing policies.

Anyone know which tree is this patched based out of?

> The main goals are:
>
> (1) small files should not be starved by big dirty files
> (2) sync as fast as possible for not-blocked inodes/pages
>     - don't leave them out; no congestion_wait() in between them
> (3) avoid busy iowait for blocked inodes
>     - retry them in the next go of s_io(maybe at the next wakeup of pdflush)
>

Fengguang do you have any specific tests for any of these cases? As I
have posted earlier I am putting together a writeback test suite for
test.kernel.org and if you have one (even if it's an ugly shell
script) that would save me some time.

Also if you want any of mine let me know. :-)

mrubin


> The role of the queues:
>
> s_dirty:   park for dirtied_when expiration
> s_io:      park for io submission
> s_more_io: for big dirty inodes, they will be retried in this run of pdflush
>            (it ensures fairness between small/large files)
> s_more_io_wait: for blocked inodes, they will be picked up in next run of s_io
>
>
> This patchset is in better shape, but still not ready for merge.
> It begins with:
>
>         [PATCH 01/13] writeback: revert 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b
>         [PATCH 02/13] writeback: clear PAGECACHE_TAG_DIRTY for truncated page in block_write_full_page()
>
> Introduces more_io/more_io_wait based policies:
>
>         [PATCH 03/13] writeback: introduce writeback_control.more_io
>         [PATCH 04/13] writeback: introduce super_block.s_more_io_wait
>         [PATCH 05/13] writeback: merge duplicate code into writeback_some_pages()
>         [PATCH 06/13] writeback: defer writeback on not-all-pages-written
>         [PATCH 07/13] writeback: defer writeback on locked inode
>         [PATCH 08/13] writeback: defer writeback on locked buffers
>         [PATCH 09/13] writeback: requeue_io() on redirtied inode
>
> And finishes with some code cleanups:
>
>         [PATCH 10/13] writeback: introduce queue_dirty()
>         [PATCH 11/13] writeback: queue_dirty() on memory-backed bdi
>         [PATCH 12/13] writeback: remove redirty_tail()
>         [PATCH 13/13] writeback: cleanup __sync_single_inode()
>
> Diffstat:
>
>  fs/buffer.c         |    2
>  fs/fs-writeback.c   |  121 +++++++++++++++---------------------------
>  fs/super.c          |    1
>  include/linux/fs.h  |    1
>  mm/page-writeback.c |   46 +++++++--------
>  5 files changed, 72 insertions(+), 99 deletions(-)
>
> Regards,
> Fengguang Wu
> --
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 00/13] writeback bug fixes and simplifications take 2
       [not found]   ` <400449941.28506@ustc.edu.cn>
@ 2008-01-16  2:18     ` Fengguang Wu
  0 siblings, 0 replies; 20+ messages in thread
From: Fengguang Wu @ 2008-01-16  2:18 UTC (permalink / raw)
  To: Michael Rubin; +Cc: Andrew Morton, Peter Zijlstra, linux-fsdevel, linux-kernel

On Tue, Jan 15, 2008 at 10:33:01AM -0800, Michael Rubin wrote:
> On Jan 15, 2008 4:36 AM, Fengguang Wu <wfg@mail.ustc.edu.cn> wrote:
> > Andrew,
> >
> > This patchset mainly polishes the writeback queuing policies.
> 
> Anyone know which tree is this patched based out of?

They are against the latest -mm tree, or 2.6.24-rc6-mm1.

> > The main goals are:
> >
> > (1) small files should not be starved by big dirty files
> > (2) sync as fast as possible for not-blocked inodes/pages
> >     - don't leave them out; no congestion_wait() in between them
> > (3) avoid busy iowait for blocked inodes
> >     - retry them in the next go of s_io(maybe at the next wakeup of pdflush)
> >
> 
> Fengguang do you have any specific tests for any of these cases? As I
> have posted earlier I am putting together a writeback test suite for
> test.kernel.org and if you have one (even if it's an ugly shell
> script) that would save me some time.

No, I just run tests with cp/dd etc.  I analyze the code and debug
traces a lot, and know that it works in the situations I can imagine.
But dedicated test suites are good in the long term.

> Also if you want any of mine let me know. :-)

OK, thank you.

Fengguang


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 09/13] writeback: requeue_io() on redirtied inode
  2008-01-15 12:36     ` [PATCH 09/13] writeback: requeue_io() on redirtied inode Fengguang Wu
@ 2008-01-16  8:13       ` David Chinner
       [not found]         ` <400543799.23158@ustc.edu.cn>
  0 siblings, 1 reply; 20+ messages in thread
From: David Chinner @ 2008-01-16  8:13 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Andrew Morton, Michael Rubin, Peter Zijlstra, linux-fsdevel,
	linux-kernel

On Tue, Jan 15, 2008 at 08:36:46PM +0800, Fengguang Wu wrote:
> Redirtied inodes could be seen in really fast writes.
> They should really be synced as soon as possible.
> 
> redirty_tail() could delay the inode for up to 30s.
> Kill the delay by using requeue_io() instead.

That's actually bad for anything that does delayed allocation
or updates state on data I/o completion.

e.g. XFS when writing past EOF doing delalloc dirties the inode
during writeout (allocation) and then updates the file size on data
I/o completion hence dirtying the inode again.

With this change, writing the last pages out would result
in hitting this code and causing the inode to be flushed very
soon after the data write. Then, after the inode write is issued,
we get data I/o completion which dirties the inode again,
resulting in needing to write the inode again to clean it.
i.e. it introduces a potential new and useless inode write
I/O.

Also, the immediate inode write may be useless for XFS because the
inode may be pinned in memory due to async transactions
still in flight (e.g. from delalloc) so we've got two
situations where flushing the inode immediately is suboptimal.

Hence I don't think this is an optimisation that should be made
in the generic writeback code.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 09/13] writeback: requeue_io() on redirtied inode
       [not found]         ` <400543799.23158@ustc.edu.cn>
@ 2008-01-17  4:22           ` Fengguang Wu
  0 siblings, 0 replies; 20+ messages in thread
From: Fengguang Wu @ 2008-01-17  4:22 UTC (permalink / raw)
  To: David Chinner
  Cc: Andrew Morton, Michael Rubin, Peter Zijlstra, linux-fsdevel,
	linux-kernel

On Wed, Jan 16, 2008 at 07:13:07PM +1100, David Chinner wrote:
> On Tue, Jan 15, 2008 at 08:36:46PM +0800, Fengguang Wu wrote:
> > Redirtied inodes could be seen in really fast writes.
> > They should really be synced as soon as possible.
> > 
> > redirty_tail() could delay the inode for up to 30s.
> > Kill the delay by using requeue_io() instead.
> 
> That's actually bad for anything that does delayed allocation
> or updates state on data I/o completion.
> 
> e.g. XFS when writing past EOF doing delalloc dirties the inode
> during writeout (allocation) and then updates the file size on data
> I/o completion hence dirtying the inode again.
> 
> With this change, writing the last pages out would result
> in hitting this code and causing the inode to be flushed very
> soon after the data write. Then, after the inode write is issued,
> we get data I/o completion which dirties the inode again,
> resulting in needing to write the inode again to clean it.
> i.e. it introduces a potential new and useless inode write
> I/O.
> 
> Also, the immediate inode write may be useless for XFS because the
> inode may be pinned in memory due to async transactions
> still in flight (e.g. from delalloc) so we've got two
> situations where flushing the inode immediately is suboptimal.
> 
> Hence I don't think this is an optimisation that should be made
> in the generic writeback code.

Thanks for the explanation.
I can confirm that many requeue_io() happened for the same XFS inode:
[  158.794562] requeue_io 328: inode 5243009 size 34647 at 03:03(hda3)
[  158.794827] mm/page-writeback.c 668 wb_kupdate: pdflush(183) 14209 global 486 10 0 wc _M tw 1013 sk 0
[  158.795293] requeue_io 328: inode 5243009 size 34647 at 03:03(hda3)
[  158.795313] mm/page-writeback.c 668 wb_kupdate: pdflush(183) 14198 global 486 10 0 wc _M tw 1024 sk 0
...
[  170.713900] requeue_io 328: inode 5243009 size 34647 at 03:03(hda3)
[  170.713925] mm/page-writeback.c 668 wb_kupdate: pdflush(183) 14198 global 1875 0 0 wc _M tw 1024 sk 0
[  170.813584] mm/page-writeback.c 668 wb_kupdate: pdflush(183) 14198 global 2855 0 0 wc __ tw 1024 sk 0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 00/13] writeback bug fixes and simplifications take 2
       [not found] <400401292.24795@ustc.edu.cn>
  2008-01-15 12:36 ` [PATCH 00/13] writeback bug fixes and simplifications take 2 Fengguang Wu
  2008-01-15 18:33 ` [PATCH 00/13] writeback bug fixes and simplifications take 2 Michael Rubin
@ 2008-01-18  7:51 ` Michael Rubin
       [not found]   ` <400644884.18765@ustc.edu.cn>
  2 siblings, 1 reply; 20+ messages in thread
From: Michael Rubin @ 2008-01-18  7:51 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: Andrew Morton, Peter Zijlstra, linux-fsdevel, linux-kernel

On Jan 15, 2008 4:36 AM, Fengguang Wu <wfg@mail.ustc.edu.cn> wrote:
> Andrew,
>
> This patchset mainly polishes the writeback queuing policies.
> The main goals are:
>
> (1) small files should not be starved by big dirty files
> (2) sync as fast as possible for not-blocked inodes/pages
>     - don't leave them out; no congestion_wait() in between them
> (3) avoid busy iowait for blocked inodes
>     - retry them in the next go of s_io(maybe at the next wakeup of pdflush)
>
> The role of the queues:
>
> s_dirty:   park for dirtied_when expiration
> s_io:      park for io submission
> s_more_io: for big dirty inodes, they will be retried in this run of pdflush
>            (it ensures fairness between small/large files)
> s_more_io_wait: for blocked inodes, they will be picked up in next run of s_io

Quick question to make sure I get this. Each queue is sorted as such:

s_dirty - sorted by the dirtied_when field
s_io - sorted by  no explicit key but by the order we want to process
in sync_sb_inodes
s_more_io - held for later they are sorted in the same manner as s_io

Is that it?

mrubin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 00/13] writeback bug fixes and simplifications take 2
       [not found]   ` <400644884.18765@ustc.edu.cn>
@ 2008-01-18  8:27     ` Fengguang Wu
  0 siblings, 0 replies; 20+ messages in thread
From: Fengguang Wu @ 2008-01-18  8:27 UTC (permalink / raw)
  To: Michael Rubin; +Cc: Andrew Morton, Peter Zijlstra, linux-fsdevel, linux-kernel

On Thu, Jan 17, 2008 at 11:51:51PM -0800, Michael Rubin wrote:
> On Jan 15, 2008 4:36 AM, Fengguang Wu <wfg@mail.ustc.edu.cn> wrote:
> > Andrew,
> >
> > This patchset mainly polishes the writeback queuing policies.
> > The main goals are:
> >
> > (1) small files should not be starved by big dirty files
> > (2) sync as fast as possible for not-blocked inodes/pages
> >     - don't leave them out; no congestion_wait() in between them
> > (3) avoid busy iowait for blocked inodes
> >     - retry them in the next go of s_io(maybe at the next wakeup of pdflush)
> >
> > The role of the queues:
> >
> > s_dirty:   park for dirtied_when expiration
> > s_io:      park for io submission
> > s_more_io: for big dirty inodes, they will be retried in this run of pdflush
> >            (it ensures fairness between small/large files)
> > s_more_io_wait: for blocked inodes, they will be picked up in next run of s_io
> 
> Quick question to make sure I get this. Each queue is sorted as such:
> 
> s_dirty - sorted by the dirtied_when field
> s_io - sorted by  no explicit key but by the order we want to process
> in sync_sb_inodes
> s_more_io - held for later they are sorted in the same manner as s_io
> 
> Is that it?

Yes, exactly. s_io and s_more_io can be considered as one list broken
up into two - to provide the cursor for sequential iteration.
And s_more_io_wait is simply a container for blocked inodes.


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-01-18  8:28 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <400401292.24795@ustc.edu.cn>
2008-01-15 12:36 ` [PATCH 00/13] writeback bug fixes and simplifications take 2 Fengguang Wu
     [not found]   ` <400401290.18034@ustc.edu.cn>
2008-01-15 12:36     ` [PATCH 01/13] writeback: revert 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b Fengguang Wu
     [not found]   ` <400401290.17576@ustc.edu.cn>
2008-01-15 12:36     ` [PATCH 02/13] writeback: clear PAGECACHE_TAG_DIRTY for truncated page in block_write_full_page() Fengguang Wu
     [not found]   ` <400401291.20012@ustc.edu.cn>
2008-01-15 12:36     ` [PATCH 03/13] writeback: introduce writeback_control.more_io Fengguang Wu
     [not found]   ` <400401291.24198@ustc.edu.cn>
2008-01-15 12:36     ` [PATCH 04/13] writeback: introduce super_block.s_more_io_wait Fengguang Wu
     [not found]   ` <400401292.20625@ustc.edu.cn>
2008-01-15 12:36     ` [PATCH 05/13] writeback: merge duplicate code into writeback_some_pages() Fengguang Wu
     [not found]   ` <400401292.17900@ustc.edu.cn>
2008-01-15 12:36     ` [PATCH 06/13] writeback: defer writeback on not-all-pages-written Fengguang Wu
     [not found]   ` <400401293.20627@ustc.edu.cn>
2008-01-15 12:36     ` [PATCH 07/13] writeback: defer writeback on locked inode Fengguang Wu
     [not found]   ` <400401293.21086@ustc.edu.cn>
2008-01-15 12:36     ` [PATCH 08/13] writeback: defer writeback on locked buffers Fengguang Wu
     [not found]   ` <400401295.20625@ustc.edu.cn>
2008-01-15 12:36     ` [PATCH 09/13] writeback: requeue_io() on redirtied inode Fengguang Wu
2008-01-16  8:13       ` David Chinner
     [not found]         ` <400543799.23158@ustc.edu.cn>
2008-01-17  4:22           ` Fengguang Wu
     [not found]   ` <400401294.29514@ustc.edu.cn>
2008-01-15 12:36     ` [PATCH 10/13] writeback: introduce queue_dirty() Fengguang Wu
     [not found]   ` <400401293.44721@ustc.edu.cn>
2008-01-15 12:36     ` [PATCH 11/13] writeback: queue_dirty() on memory-backed bdi Fengguang Wu
     [not found]   ` <400401294.78786@ustc.edu.cn>
2008-01-15 12:36     ` [PATCH 12/13] writeback: remove redirty_tail() Fengguang Wu
     [not found]   ` <400401294.20012@ustc.edu.cn>
2008-01-15 12:36     ` [PATCH 13/13] writeback: cleanup __sync_single_inode() Fengguang Wu
2008-01-15 18:33 ` [PATCH 00/13] writeback bug fixes and simplifications take 2 Michael Rubin
     [not found]   ` <400449941.28506@ustc.edu.cn>
2008-01-16  2:18     ` Fengguang Wu
2008-01-18  7:51 ` Michael Rubin
     [not found]   ` <400644884.18765@ustc.edu.cn>
2008-01-18  8:27     ` Fengguang Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).