LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Arthur Jones <ajones@riverbed.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"sandeen@redhat.com" <sandeen@redhat.com>,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
"sct@redhat.com" <sct@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] ext3: wait on all pending commits in ext3_sync_fs
Date: Fri, 19 Dec 2008 00:17:07 +0100 [thread overview]
Message-ID: <20081218231707.GB20092@atrey.karlin.mff.cuni.cz> (raw)
In-Reply-To: <20081103201428.GB30565@ajones-laptop.nbttech.com>
Hello,
I'm sorry I'm replying late but I got time to react to this only now...
> On Mon, Nov 03, 2008 at 11:33:18AM -0800, Andrew Morton wrote:
> > [...]
> > > --- a/fs/ext3/super.c
> > > +++ b/fs/ext3/super.c
> > > @@ -2392,7 +2392,13 @@ static int ext3_sync_fs(struct super_block *sb, int wait)
> > > if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
> > > if (wait)
> > > log_wait_commit(EXT3_SB(sb)->s_journal, target);
> > > - }
> > > + } else if (wait)
> > > + /*
> > > + * We may have a commit in progress, clear it out
> > > + * before we go on...
> > > + */
> > > + ext3_force_commit(sb);
> > > +
> > > return 0;
> > > }
> >
> > Can we do
> >
> > sb->s_dirt = 0;
> > if (wait)
> > ext3_force_commit(...);
> > else
> > journal_start_commit(...);
>
> I think this is what you had in mind:
>
>
> diff --git a/fs/ext3/super.c b/fs/ext3/super.c
> index 18eaa78..2b48b85 100644
> --- a/fs/ext3/super.c
> +++ b/fs/ext3/super.c
> @@ -2386,13 +2386,12 @@ static void ext3_write_super (struct super_block * sb)
>
> static int ext3_sync_fs(struct super_block *sb, int wait)
> {
> - tid_t target;
> -
> sb->s_dirt = 0;
> - if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
> - if (wait)
> - log_wait_commit(EXT3_SB(sb)->s_journal, target);
> - }
> + if (wait)
> + ext3_force_commit(sb);
> + else
> + journal_start_commit(EXT3_SB(sb)->s_journal, NULL);
> +
> return 0;
> }
>
> I tried this and it too fixes the problem. FWIW I agree it
> looks better...
Well, shouldn't we rather fix what journal_start_commit() returns?
The interface which returns 1 when a transaction is already committing or
a transaction commit has just been started but 0 when we race with
somebody staring the commit is fairly unusable. Moreover
ext3_force_commit() will unnecessarily create new sync transaction and
commit it if there's no transaction running which is quite expensive
(even merging empty sync handle is not for free because of sync
transaction batching). But this is minor problem since we probably
don't care too much about sync() performance - BTW this is probably a
cause for bug 12224, isn't it?
BTW: ocfs2 would need fixing as well if done your way since it's
sync_fs function has been copied over from ext3.
To summarized I'd rather see a patch like below (untested) going in
and your patch reverted... Opinions? I can cookup a JBD2 version of
the patch in case we agree to go this way.
Honza
>From 0a578ba1b56fe655570ee6dad41748863a120dbc Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Fri, 19 Dec 2008 00:05:34 +0100
Subject: [PATCH] jbd: Fix return value of journal_start_commit()
journal_start_commit() returns 1 if either a transaction is committing or the
function has queued a transaction commit. But it returns 0 if we raced with
somebody queueing the transaction commit as well. This resulted in
ext3_sync_fs() not functioning correctly (description from Arthur Jones):
In the case of a data=ordered umount with pending long symlinks which are
delayed due to a long list of other I/O on the backing block device, this
causes the buffer associated with the long symlinks to not be moved to the
inode dirty list in the second phase of fsync_super. Then, before they can be
dirtied again, kjournald exits, seeing the UMOUNT flag and the dirty pages are
never written to the backing block device, causing long symlink corruption and
exposing new or previously freed block data to userspace.
This can be reproduced with a script created by Eric Sandeen
<sandeen@redhat.com>:
#!/bin/bash
umount /mnt/test2
mount /dev/sdb4 /mnt/test2
rm -f /mnt/test2/*
dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512
touch /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
ln -s /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
/mnt/test2/link
umount /mnt/test2
mount /dev/sdb4 /mnt/test2
ls /mnt/test2/
This patch fixes journal_start_commit() to always return 1 when there's
a transaction committing or queued for commit.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/jbd/journal.c | 17 +++++++++++------
1 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
index 9e4fa52..e79c078 100644
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
@@ -427,7 +427,7 @@ int __log_space_left(journal_t *journal)
}
/*
- * Called under j_state_lock. Returns true if a transaction was started.
+ * Called under j_state_lock. Returns true if a transaction commit was started.
*/
int __log_start_commit(journal_t *journal, tid_t target)
{
@@ -495,7 +495,8 @@ int journal_force_commit_nested(journal_t *journal)
/*
* Start a commit of the current running transaction (if any). Returns true
- * if a transaction was started, and fills its tid in at *ptid
+ * if a transaction is going to be committed (or is currently already
+ * committing), and fills its tid in at *ptid
*/
int journal_start_commit(journal_t *journal, tid_t *ptid)
{
@@ -505,15 +506,19 @@ int journal_start_commit(journal_t *journal, tid_t *ptid)
if (journal->j_running_transaction) {
tid_t tid = journal->j_running_transaction->t_tid;
- ret = __log_start_commit(journal, tid);
- if (ret && ptid)
+ __log_start_commit(journal, tid);
+ /* There's a running transaction and we've just made sure
+ * it's commit has been scheduled. */
+ if (ptid)
*ptid = tid;
- } else if (journal->j_committing_transaction && ptid) {
+ ret = 1;
+ } else if (journal->j_committing_transaction) {
/*
* If ext3_write_super() recently started a commit, then we
* have to wait for completion of that transaction
*/
- *ptid = journal->j_committing_transaction->t_tid;
+ if (ptid)
+ *ptid = journal->j_committing_transaction->t_tid;
ret = 1;
}
spin_unlock(&journal->j_state_lock);
--
1.6.0.2
next prev parent reply other threads:[~2008-12-18 23:17 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20081024183733.GA25797@ajones-laptop.nbttech.com>
2008-10-27 16:54 ` ext3: slow symlink corruption on umount Arthur Jones
2008-10-29 19:54 ` Arthur Jones
2008-10-29 20:36 ` Eric Sandeen
2008-10-29 21:09 ` Theodore Tso
2008-10-30 13:38 ` Eric Sandeen
2008-10-30 13:55 ` Arthur Jones
2008-10-31 9:47 ` Nick Piggin
2008-10-30 17:40 ` Arthur Jones
2008-10-30 18:03 ` Eric Sandeen
2008-10-30 21:34 ` Arthur Jones
2008-10-31 17:24 ` Arthur Jones
2008-10-31 18:37 ` Eric Sandeen
2008-10-30 18:32 ` Arthur Jones
2008-11-03 18:44 ` [PATCH] ext3: wait on all pending commits in ext3_sync_fs Arthur Jones
2008-11-03 19:33 ` Andrew Morton
2008-11-03 20:14 ` Arthur Jones
2008-11-03 20:37 ` Andrew Morton
2008-11-03 20:58 ` Arthur Jones
2008-11-03 21:13 ` Andrew Morton
2008-11-03 21:19 ` Theodore Tso
2008-11-03 21:27 ` Andrew Morton
2008-11-03 21:48 ` Theodore Tso
2008-11-03 22:01 ` Theodore Tso
2008-11-03 22:18 ` Arthur Jones
2008-11-03 22:27 ` Andrew Morton
2008-11-03 22:55 ` Theodore Tso
2008-11-03 23:01 ` Arthur Jones
2008-11-03 23:12 ` Theodore Tso
2008-11-04 16:26 ` Arthur Jones
2008-11-03 21:48 ` Arthur Jones
2008-11-03 22:47 ` Theodore Tso
2008-12-18 23:17 ` Jan Kara [this message]
2008-12-18 23:37 ` Eric Sandeen
2008-12-19 0:27 ` Jan Kara
2008-12-19 1:34 ` Eric Sandeen
2008-12-22 19:15 ` Ric Wheeler
2008-12-22 22:57 ` Andreas Dilger
2008-12-23 0:09 ` Ric Wheeler
2008-12-23 15:56 ` Eric Sandeen
2009-01-12 22:28 ` Jan Kara
2009-01-13 17:21 ` Eric Sandeen
2009-01-13 22:14 ` Eric Sandeen
2009-01-14 4:24 ` Theodore Tso
2009-01-14 17:26 ` Eric Sandeen
2009-01-14 17:27 ` Jan Kara
2009-01-29 18:27 ` Mike Snitzer
2009-01-29 20:05 ` Eric Sandeen
2008-11-03 19:59 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081218231707.GB20092@atrey.karlin.mff.cuni.cz \
--to=jack@suse.cz \
--cc=ajones@riverbed.com \
--cc=akpm@linux-foundation.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sandeen@redhat.com \
--cc=sct@redhat.com \
--subject='Re: [PATCH] ext3: wait on all pending commits in ext3_sync_fs' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).