LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: akpm@osdl.org, sct@redhat.com
Cc: linux-kernel@vger.kernel.org
Subject: [PATCH] Fix commit of ordered data buffers
Date: Tue, 13 Sep 2005 17:30:24 +0200	[thread overview]
Message-ID: <20050913153024.GL30108@atrey.karlin.mff.cuni.cz> (raw)

[-- Attachment #1: Type: text/plain, Size: 1444 bytes --]

  Hello,

  attached patch should fix handling of ordered data buffers in
journal_commit_write(). Currently the code simply moves the buffer
from t_sync_data list to t_locked list when the buffer is locked.
But in theory (I'm not sure this currently really happens for ext3
ordered data buffer) the buffer can be locked just for "examination"
purposes and not being sent to disk. Hence it could happen we don't
write some ordered buffer when we should.
  What the patch does is that it writes the buffer when it is dirty
(even if it is locked - hence it can happen we call ll_rw_block()
on the buffer currently under write-out but in this case ll_rw_block()
just waits until IO is complete and returns (as the buffer won't be
dirty anymore) and this race with pdflush should be rare enough not
to affect the performance). The patch also moves buffer to t_locked
list immediately after queing the buffer for write-out (as otherwise
we would have to detect which buffers are already queued and that's not
nice). This changes a bit a logic of t_locked list - unlocked buffer
for the users different from journal_commit_transaction() does not mean
the buffer is already written. But only journal_unmap_buffer() cares
about this changes and I changed that one to check if the buffer is not
dirty.
  Andrew, if you think the change is fine, please put it into -mm so
that it gets some testing.

							Honza

-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

[-- Attachment #2: jbd-2.6.13-mm1-2-orderedwrite.diff --]
[-- Type: text/plain, Size: 4249 bytes --]

When a buffer is locked it does not mean that write-out is in progress. We
have to check if the buffer is dirty and if it is we have to submit it
for write-out. We unconditionally move the buffer to t_locked_list so
that we don't mistake unprocessed buffer and buffer not yet given to
ll_rw_block(). This subtly changes the meaning of buffer states in
t_locked_list - unlock buffer (for list users different from
journal_commit_transaction()) does not mean it has been written. But
only journal_unmap_buffer() cares and it now checks if the buffer
is not dirty.

Signed-off-by: Jan Kara <jack@suse.cz>

diff -rupX /home/jack/.kerndiffexclude linux-2.6.13-mm1-1-invalidatepage/fs/jbd/commit.c linux-2.6.13-mm1-2-orderedwrite/fs/jbd/commit.c
--- linux-2.6.13-mm1-1-invalidatepage/fs/jbd/commit.c	2005-09-11 22:49:07.000000000 +0200
+++ linux-2.6.13-mm1-2-orderedwrite/fs/jbd/commit.c	2005-09-17 01:53:46.000000000 +0200
@@ -337,19 +337,32 @@ write_out_data:
 		jh = commit_transaction->t_sync_datalist;
 		commit_transaction->t_sync_datalist = jh->b_tnext;
 		bh = jh2bh(jh);
-		if (buffer_locked(bh)) {
-			BUFFER_TRACE(bh, "locked");
+
+		/*
+		 * If the buffer is dirty, it needs a write-out (the write-out
+		 * may be already in progress but that should be rare so giving
+		 * the buffer once more to ll_rw_block() should not affect
+		 * the performance). We move the buffer out of t_sync_datalist
+		 * to t_locked_list to make a progress. This handling of dirty
+		 * buffer has the disadvantage that for functions different
+		 * from journal_commit_transaction() unlocked buffer in
+		 * t_locked_list does not mean it has been written. *BUT* the
+		 * only place that cares is journal_unmap_buffer() and that
+		 * checks that the buffer is not dirty.
+		 *
+		 * If the buffer is locked and not dirty, someone has submitted
+		 * the buffer for IO before us so we just move the buffer to
+		 * t_locked_list (to wait for IO completion).
+		 */
+		if (buffer_locked(bh) || buffer_dirty(bh)) {
+			BUFFER_TRACE(bh, "locked or dirty");
 			if (!inverted_lock(journal, bh))
 				goto write_out_data;
 			__journal_temp_unlink_buffer(jh);
 			__journal_file_buffer(jh, commit_transaction,
 						BJ_Locked);
 			jbd_unlock_bh_state(bh);
-			if (lock_need_resched(&journal->j_list_lock)) {
-				spin_unlock(&journal->j_list_lock);
-				goto write_out_data;
-			}
-		} else {
+
 			if (buffer_dirty(bh)) {
 				BUFFER_TRACE(bh, "start journal writeout");
 				get_bh(bh);
@@ -363,19 +376,19 @@ write_out_data:
 					bufs = 0;
 					goto write_out_data;
 				}
-			} else {
-				BUFFER_TRACE(bh, "writeout complete: unfile");
-				if (!inverted_lock(journal, bh))
-					goto write_out_data;
-				__journal_unfile_buffer(jh);
-				jbd_unlock_bh_state(bh);
-				journal_remove_journal_head(bh);
-				put_bh(bh);
-				if (lock_need_resched(&journal->j_list_lock)) {
-					spin_unlock(&journal->j_list_lock);
-					goto write_out_data;
-				}
 			}
+		} else {
+			BUFFER_TRACE(bh, "writeout complete: unfile");
+			if (!inverted_lock(journal, bh))
+				goto write_out_data;
+			__journal_unfile_buffer(jh);
+			jbd_unlock_bh_state(bh);
+			journal_remove_journal_head(bh);
+			put_bh(bh);
+		}
+		if (lock_need_resched(&journal->j_list_lock)) {
+			spin_unlock(&journal->j_list_lock);
+			goto write_out_data;
 		}
 	}
 
diff -rupX /home/jack/.kerndiffexclude linux-2.6.13-mm1-1-invalidatepage/fs/jbd/transaction.c linux-2.6.13-mm1-2-orderedwrite/fs/jbd/transaction.c
--- linux-2.6.13-mm1-1-invalidatepage/fs/jbd/transaction.c	2005-09-11 22:49:07.000000000 +0200
+++ linux-2.6.13-mm1-2-orderedwrite/fs/jbd/transaction.c	2005-09-17 01:53:46.000000000 +0200
@@ -1814,11 +1814,11 @@ static int journal_unmap_buffer(journal_
 			}
 		}
 	} else if (transaction == journal->j_committing_transaction) {
-		if (jh->b_jlist == BJ_Locked) {
+		if (jh->b_jlist == BJ_Locked && !buffer_dirty(bh)) {
 			/*
 			 * The buffer is on the committing transaction's locked
-			 * list.  We have the buffer locked, so I/O has
-			 * completed.  So we can nail the buffer now.
+			 * list.  We have the buffer locked and !dirty, so I/O
+			 * has completed.  So we can nail the buffer now.
 			 */
 			may_free = __dispose_buffer(jh, transaction);
 			goto zap_buffer;

             reply	other threads:[~2005-09-13 15:28 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-13 15:30 Jan Kara [this message]
2005-09-14  1:43 ` Andrew Morton
2005-09-14 12:03   ` Jan Kara
2005-09-14 18:30     ` Andrew Morton
2005-09-15  7:23       ` Jan Kara
2005-09-14 13:02   ` Stephen C. Tweedie
2006-09-11 21:05 Jan Kara
2006-09-11 22:12 ` Andrew Morton
2006-09-28  8:31 ` Zhang, Yanmin
2006-09-28 21:35   ` Jan Kara
2006-09-29  1:27     ` Zhang, Yanmin
2006-09-29  9:21       ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050913153024.GL30108@atrey.karlin.mff.cuni.cz \
    --to=jack@suse.cz \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sct@redhat.com \
    --subject='Re: [PATCH] Fix commit of ordered data buffers' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).