LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, supriyak@in.ibm.com
Subject: Re: [PATCH] Fix private_list handling
Date: Mon, 14 Jan 2008 19:14:10 +0100	[thread overview]
Message-ID: <20080114181410.GI4214@duck.suse.cz> (raw)
In-Reply-To: <20080111153354.324a5af9.akpm@linux-foundation.org>

On Fri 11-01-08 15:33:54, Andrew Morton wrote:
> On Fri, 11 Jan 2008 15:21:31 +0100
> Jan Kara <jack@suse.cz> wrote:
> 
> > On Thu 10-01-08 16:36:35, Andrew Morton wrote:
> > > On Thu, 10 Jan 2008 16:55:13 +0100
> > > Jan Kara <jack@suse.cz> wrote:
> > > 
> > > >   Hi,
> > > > 
> > > >   sorry for the previous empty email...
> > > > 
> > > >   Supriya noted in his testing that sometimes buffers removed by
> > > > __remove_assoc_queue() don't have b_assoc_mapping set (and thus IO error
> > > > won't be properly propagated). Actually, looking more into the code I found
> > > > there are some more races. The patch below should fix them. It survived
> > > > beating with LTP and fsstress on ext2 filesystem on my testing machine so
> > > > it should be reasonably bugfree... Andrew, would you put the patch into
> > > > -mm? Thanks.
> > > > 
> > > > 								Honza
> > > > -- 
> > > > Jan Kara <jack@suse.cz>
> > > > SUSE Labs, CR
> > > > ---
> > > > 
> > > > There are two possible races in handling of private_list in buffer cache.
> > > > 1) When fsync_buffers_list() processes a private_list, it clears
> > > > b_assoc_mapping and moves buffer to its private list. Now drop_buffers() comes,
> > > > sees a buffer is on list so it calls __remove_assoc_queue() which complains
> > > > about b_assoc_mapping being cleared (as it cannot propagate possible IO error).
> > > > This race has been actually observed in the wild.
> > > 
> > > private_lock should prevent this race.
> > > 
> > > Which call to drop_buffers() is the culprit?  The first one in
> > > try_to_free_buffers(), I assume?  The "can this still happen?" one?
> > > 
> > > If so, it can happen.  How?  Perhaps this is a bug.
> >   Good question but I don't think so. The problem is that
> > fsync_buffers_list() drops the private_lock() e.g. when it does
> > wait_on_buffer(). And buffers on the list fsync_buffers_list() constructs
> > have b_assoc_mapping set to NULL but the test
> > !list_empty(bh->b_assoc_buffers) succeeds for them and thus
> > __remove_assoc_queue() is called and it complains.
> >   We could also silence the warning by leaving b_assoc_mapping set when we
> > move the buffer to the constructed list.
> 
> Could fsync_buffers_list() leave the buffer not on a list when it does the
> ll_rw_block() and only add it to tmp afterwards if it sees that the buffer
> is still not on a list?
> 
> I guess not, as it still needs to find the buffer to wait upon it.
> 
> > But given the problem below
> > I've decided to do a more complete cleanup of the code.
> 
> Well.  Large-scale changes to long-established code like this are a last
> resort.  We should fully investigate little local fixups first.
> 
> > > > 2) When fsync_buffers_list() processes a private_list,
> > > > mark_buffer_dirty_inode() can be called on bh which is already on the private
> > > > list of fsync_buffers_list(). As buffer is on some list (note that the check is
> > > > performed without private_lock), it is not readded to the mapping's
> > > > private_list and after fsync_buffers_list() finishes, we have a dirty buffer
> > > > which should be on private_list but it isn't. This race has not been reported,
> > > > probably because most (but not all) callers of mark_buffer_dirty_inode() hold
> > > > i_mutex and thus are serialized with fsync().
> > > 
> > > Maybe fsync_buffers_list should put the buffer back onto private_list if it
> > > got dirtied again.
> >   Yes, that's what it does in my new version. Only the locking is a bit
> > subtle if we want to avoid taking private_lock in mark_buffer_dirty_inode()
> > when the buffer is already on some list...
> 
> I fear rewrites in there.  It took five years to find this bug.  How long
> will it take to find new ones which get added?
> 
> Sigh.  lists do suck for this sort of thing.
  OK, below is a minimal fix for the problems...

										Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
---

There are two possible races in handling of private_list in buffer cache.
1) When fsync_buffers_list() processes a private_list, it clears
b_assoc_mapping and moves buffer to its private list. Now drop_buffers() comes,
sees a buffer is on list so it calls __remove_assoc_queue() which complains
about b_assoc_mapping being cleared (as it cannot propagate possible IO error).
This race has been actually observed in the wild.
2) When fsync_buffers_list() processes a private_list,
mark_buffer_dirty_inode() can be called on bh which is already on the private
list of fsync_buffers_list(). As buffer is on some list (note that the check is
performed without private_lock), it is not readded to the mapping's
private_list and after fsync_buffers_list() finishes, we have a dirty buffer
which should be on private_list but it isn't. This race has not been reported,
probably because most (but not all) callers of mark_buffer_dirty_inode() hold
i_mutex and thus are serialized with fsync().

Fix these issues by not clearing b_assoc_map when fsync_buffers_list() moves
buffer to a dedicated list and by reinserting buffer in private_list when
it is found dirty after the IO has completed.

Signed-off-by: Jan Kara <jack@suse.cz>

diff --git a/fs/buffer.c b/fs/buffer.c
index 7249e01..3ffb2b6 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -794,6 +794,7 @@ static int fsync_buffers_list(spinlock_t *lock, struct list_head *list)
 {
 	struct buffer_head *bh;
 	struct list_head tmp;
+	struct address_space *mapping;
 	int err = 0, err2;
 
 	INIT_LIST_HEAD(&tmp);
@@ -801,9 +802,11 @@ static int fsync_buffers_list(spinlock_t *lock, struct list_head *list)
 	spin_lock(lock);
 	while (!list_empty(list)) {
 		bh = BH_ENTRY(list->next);
+		mapping = bh->b_assoc_map;
 		__remove_assoc_queue(bh);
 		if (buffer_dirty(bh) || buffer_locked(bh)) {
 			list_add(&bh->b_assoc_buffers, &tmp);
+			bh->b_assoc_map = mapping;
 			if (buffer_dirty(bh)) {
 				get_bh(bh);
 				spin_unlock(lock);
@@ -828,8 +831,13 @@ static int fsync_buffers_list(spinlock_t *lock, struct list_head *list)
 		wait_on_buffer(bh);
 		if (!buffer_uptodate(bh))
 			err = -EIO;
-		brelse(bh);
 		spin_lock(lock);
+		if (buffer_dirty(bh) && list_empty(&bh->b_assoc_buffers)) {
+			BUG_ON(!bh->b_assoc_map);
+			list_add(&bh->b_assoc_buffers,
+				 &bh->b_assoc_map->private_list);
+		}
+		brelse(bh);
 	}
 	
 	spin_unlock(lock);

  parent reply	other threads:[~2008-01-14 18:14 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-10 15:50 Jan Kara
2008-01-10 15:55 ` Jan Kara
2008-01-11  0:36   ` Andrew Morton
2008-01-11 14:21     ` Jan Kara
2008-01-11 23:33       ` Andrew Morton
2008-01-14 11:59         ` Jan Kara
2008-01-14 18:14         ` Jan Kara [this message]
2008-02-04 16:44 Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080114181410.GI4214@duck.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=supriyak@in.ibm.com \
    --subject='Re: [PATCH] Fix private_list handling' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).