LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Josef Bacik <jbacik@fb.com>
To: <linux-fsdevel@vger.kernel.org>, <david@fromorbit.com>,
<viro@zeniv.linux.org.uk>, <jack@suse.cz>,
<linux-kernel@vger.kernel.org>
Cc: Dave Chinner <dchinner@redhat.com>
Subject: [PATCH 4/9] sync: serialise per-superblock sync operations
Date: Tue, 10 Mar 2015 15:45:19 -0400 [thread overview]
Message-ID: <1426016724-23912-5-git-send-email-jbacik@fb.com> (raw)
In-Reply-To: <1426016724-23912-1-git-send-email-jbacik@fb.com>
From: Dave Chinner <dchinner@redhat.com>
When competing sync(2) calls walk the same filesystem, they need to
walk the list of inodes on the superblock to find all the inodes
that we need to wait for IO completion on. However, when multiple
wait_sb_inodes() calls do this at the same time, they contend on the
the inode_sb_list_lock and the contention causes system wide
slowdowns. In effect, concurrent sync(2) calls can take longer and
burn more CPU than if they were serialised.
Stop the worst of the contention by adding a per-sb mutex to wrap
around wait_sb_inodes() so that we only execute one sync(2) IO
completion walk per superblock superblock at a time and hence avoid
contention being triggered by concurrent sync(2) calls.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
fs/fs-writeback.c | 11 +++++++++++
fs/super.c | 1 +
include/linux/fs.h | 2 ++
3 files changed, 14 insertions(+)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 9780f6c..dcfe21c 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1288,6 +1288,15 @@ out_unlock_inode:
}
EXPORT_SYMBOL(__mark_inode_dirty);
+/*
+ * The @s_sync_lock is used to serialise concurrent sync operations
+ * to avoid lock contention problems with concurrent wait_sb_inodes() calls.
+ * Concurrent callers will block on the s_sync_lock rather than doing contending
+ * walks. The queueing maintains sync(2) required behaviour as all the IO that
+ * has been issued up to the time this function is enter is guaranteed to be
+ * completed by the time we have gained the lock and waited for all IO that is
+ * in progress regardless of the order callers are granted the lock.
+ */
static void wait_sb_inodes(struct super_block *sb)
{
struct inode *inode, *old_inode = NULL;
@@ -1298,6 +1307,7 @@ static void wait_sb_inodes(struct super_block *sb)
*/
WARN_ON(!rwsem_is_locked(&sb->s_umount));
+ mutex_lock(&sb->s_sync_lock);
spin_lock(&sb->s_inode_list_lock);
/*
@@ -1339,6 +1349,7 @@ static void wait_sb_inodes(struct super_block *sb)
}
spin_unlock(&sb->s_inode_list_lock);
iput(old_inode);
+ mutex_unlock(&sb->s_sync_lock);
}
/**
diff --git a/fs/super.c b/fs/super.c
index 85d6a62..6a05d94 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -190,6 +190,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
s->s_flags = flags;
INIT_HLIST_NODE(&s->s_instances);
INIT_HLIST_BL_HEAD(&s->s_anon);
+ mutex_init(&s->s_sync_lock);
INIT_LIST_HEAD(&s->s_inodes);
spin_lock_init(&s->s_inode_list_lock);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 33a450b..4418693 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1311,6 +1311,8 @@ struct super_block {
struct list_lru s_inode_lru ____cacheline_aligned_in_smp;
struct rcu_head rcu;
+ struct mutex s_sync_lock; /* sync serialisation lock */
+
/*
* Indicates how deep in a filesystem stack this SB is
*/
--
1.9.3
next prev parent reply other threads:[~2015-03-10 19:47 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-10 19:45 [PATCH 0/9] Sync and VFS scalability improvements Josef Bacik
2015-03-10 19:45 ` [PATCH 1/9] writeback: plug writeback at a high level Josef Bacik
2015-03-10 19:45 ` [PATCH 2/9] inode: add IOP_NOTHASHED to avoid inode hash lock in evict Josef Bacik
2015-03-12 9:52 ` Al Viro
2015-03-12 12:18 ` [PATCH] inode: add hlist_fake to avoid the " Josef Bacik
2015-03-12 12:20 ` [PATCH] inode: add hlist_fake to avoid the inode hash lock in evict V2 Josef Bacik
2015-03-14 7:00 ` Jan Kara
2015-03-12 12:24 ` [PATCH 2/9] inode: add IOP_NOTHASHED to avoid inode hash lock in evict Josef Bacik
2015-03-10 19:45 ` [PATCH 3/9] inode: convert inode_sb_list_lock to per-sb Josef Bacik
2015-03-10 19:45 ` Josef Bacik [this message]
2015-03-10 19:45 ` [PATCH 5/9] inode: rename i_wb_list to i_io_list Josef Bacik
2015-03-10 19:45 ` [PATCH 6/9] bdi: add a new writeback list for sync Josef Bacik
2015-03-16 10:14 ` Jan Kara
2015-03-10 19:45 ` [PATCH 7/9] writeback: periodically trim the writeback list Josef Bacik
2015-03-16 10:16 ` Jan Kara
2015-03-16 11:43 ` Jan Kara
2015-03-10 19:45 ` [PATCH 8/9] inode: convert per-sb inode list to a list_lru Josef Bacik
2015-03-16 12:27 ` Jan Kara
2015-03-16 15:34 ` Josef Bacik
2015-03-16 15:48 ` Jan Kara
2015-03-10 19:45 ` [PATCH 9/9] inode: don't softlockup when evicting inodes Josef Bacik
2015-03-16 12:31 ` Jan Kara
2015-03-16 11:39 ` [PATCH 0/9] Sync and VFS scalability improvements Jan Kara
2015-03-25 11:18 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1426016724-23912-5-git-send-email-jbacik@fb.com \
--to=jbacik@fb.com \
--cc=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
--subject='Re: [PATCH 4/9] sync: serialise per-superblock sync operations' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).