LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Dave Hansen <hansendc@us.ibm.com>
To: linux-kernel@vger.kernel.org
Cc: akpm@osdl.org, hch@infradead.org, Dave Hansen <hansendc@us.ibm.com>
Subject: [PATCH 03/22] record when sb_writer_count elevated for inode
Date: Fri, 09 Feb 2007 14:53:31 -0800	[thread overview]
Message-ID: <20070209225331.4457B8CB@localhost.localdomain> (raw)
In-Reply-To: <20070209225329.27619A62@localhost.localdomain>


There are a number of filesystems that do iput()s without first
having messed with i_nlink.  In order to keep from accidentally
decrementing the superblock writer count for these, we record
when the count is bumped up, so that we can properly balance
it.

I first tried to do this by assuming that, for each dec_nlink() to
zero, there was exactly one call to iput_final().  But, there are
a number of cases where this isn't true, especially in error handling
code.  Even if all of the filesystems were fixed up, it would be simple
to reintroduce new bugs imbalancing the mnt writer count.  This patch
trades that possibility for the chance that we will miss a i_nlink--,
and not bump the sb writer count.

I like the idea screwing up writing out a single inode better than
screwing up a global superblock count imbalance that will affect
all inodes on the superblock.

Also, since this is the first non-trivial use of the inc/drop_nlink()
functions, add some kernel docs for them.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---

 lxc-dave/fs/inode.c         |    7 +++++
 lxc-dave/fs/libfs.c         |    1 
 lxc-dave/include/linux/fs.h |   58 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+)

diff -puN fs/inode.c~04-24-record-when-sb-writer-count-elevated-for-inode fs/inode.c
--- lxc/fs/inode.c~04-24-record-when-sb-writer-count-elevated-for-inode	2007-02-09 14:26:48.000000000 -0800
+++ lxc-dave/fs/inode.c	2007-02-09 14:26:48.000000000 -0800
@@ -1097,10 +1097,17 @@ static inline void iput_final(struct ino
 {
 	const struct super_operations *op = inode->i_sb->s_op;
 	void (*drop)(struct inode *) = generic_drop_inode;
+	int must_drop_sb_write = (inode->i_state & I_AWAITING_FINAL_IPUT);
+	struct super_block *sb = inode->i_sb;
 
 	if (op && op->drop_inode)
 		drop = op->drop_inode;
 	drop(inode);
+	if (must_drop_sb_write) {
+		spin_lock(&sb->s_mnt_writers_lock);
+		sb->s_writers--;
+		spin_unlock(&sb->s_mnt_writers_lock);
+	}
 }
 
 /**
diff -puN fs/libfs.c~04-24-record-when-sb-writer-count-elevated-for-inode fs/libfs.c
--- lxc/fs/libfs.c~04-24-record-when-sb-writer-count-elevated-for-inode	2007-02-09 14:26:48.000000000 -0800
+++ lxc-dave/fs/libfs.c	2007-02-09 14:26:48.000000000 -0800
@@ -388,6 +388,7 @@ int simple_fill_super(struct super_block
 	 * because the root inode is 1, the files array must not contain an
 	 * entry at index 1
 	 */
+	inode->i_state |= I_AWAITING_FINAL_IPUT;
 	inode->i_ino = 1;
 	inode->i_mode = S_IFDIR | 0755;
 	inode->i_uid = inode->i_gid = 0;
diff -puN include/linux/fs.h~04-24-record-when-sb-writer-count-elevated-for-inode include/linux/fs.h
--- lxc/include/linux/fs.h~04-24-record-when-sb-writer-count-elevated-for-inode	2007-02-09 14:26:48.000000000 -0800
+++ lxc-dave/include/linux/fs.h	2007-02-09 14:26:48.000000000 -0800
@@ -1230,6 +1230,7 @@ struct super_operations {
 #define I_CLEAR			32
 #define I_NEW			64
 #define I_WILL_FREE		128
+#define I_AWAITING_FINAL_IPUT		256
 
 #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES)
 
@@ -1244,6 +1245,14 @@ static inline void mark_inode_dirty_sync
 	__mark_inode_dirty(inode, I_DIRTY_SYNC);
 }
 
+/**
+ * inc_nlink - directly increment an inode's link count
+ * @inode: inode
+ *
+ * This is a low-level filesystem helper to replace any
+ * direct filesystem manipulation of i_nlink.  Currently,
+ * it is only here for parity with dec_nlink().
+ */
 static inline void inc_nlink(struct inode *inode)
 {
 	inode->i_nlink++;
@@ -1255,14 +1264,63 @@ static inline void inode_inc_link_count(
 	mark_inode_dirty(inode);
 }
 
+/**
+ * check_nlink - check an inode's status after direct
+ * 		 i_nlink modification.
+ * @inode: inode
+ *
+ * Some filesystems can not make simple incremental changes
+ * to i_nlink, most notably clustered ones.  They must do
+ * direct manipulation of i_nlink.  This function must be
+ * called after such modifications are complete to make
+ * sure that the VFS knows that the inode is going to go
+ * away.
+ */
+static inline void check_nlink(struct inode *inode)
+{
+	if (inode->i_nlink)
+		return;
+
+	inode->i_state |= I_AWAITING_FINAL_IPUT;
+	spin_lock(&inode->i_sb->s_mnt_writers_lock);
+	inode->i_sb->s_writers++;
+	spin_unlock(&inode->i_sb->s_mnt_writers_lock);
+}
+
+/**
+ * drop_nlink - directly drop an inode's link count
+ * @inode: inode
+ *
+ * This is a low-level filesystem helper to replace any
+ * direct filesystem manipulation of i_nlink.  In cases
+ * where we are attempting to track writes to the
+ * filesystem, a decrement to zero means an imminent
+ * write when the file is truncated and actually unlinked
+ * on the filesystem.
+ */
 static inline void drop_nlink(struct inode *inode)
 {
 	inode->i_nlink--;
+	check_nlink(inode);
 }
 
+/**
+ * clear_nlink - directly zero an inode's link count
+ * @inode: inode
+ *
+ * This is a low-level filesystem helper to replace any
+ * direct filesystem manipulation of i_nlink.  See
+ * drop_nlink() for why we care about i_nlink hitting zero.
+ *
+ * Note that we could do the i_state flag directly in here,
+ * but we call check_nlink() to keep the number of places
+ * where the flag is set to exactly one.  The compiler
+ * should get rid of the superfluous i_nlink check.
+ */
 static inline void clear_nlink(struct inode *inode)
 {
 	inode->i_nlink = 0;
+	check_nlink(inode);
 }
 
 static inline void inode_dec_link_count(struct inode *inode)
_

  parent reply	other threads:[~2007-02-09 22:53 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-09 22:53 [PATCH 01/22] filesystem helpers for custom 'struct file's Dave Hansen
2007-02-09 22:53 ` [PATCH 02/22] r/o bind mounts: add vfsmount writer counts Dave Hansen
2007-02-09 23:41   ` Eric Dumazet
2007-02-10  0:10     ` Dave Hansen
2007-02-09 22:53 ` Dave Hansen [this message]
2007-02-09 22:53 ` [PATCH 04/22] elevate writer count for chown and friends Dave Hansen
2007-02-09 22:53 ` [PATCH 05/22] elevate mnt writers for callers of vfs_mkdir() Dave Hansen
2007-02-09 22:53 ` [PATCH 06/22] elevate write count during entire ncp_ioctl() Dave Hansen
2007-02-09 22:53 ` [PATCH 07/22] elevate write count for link and symlink calls Dave Hansen
2007-02-09 22:53 ` [PATCH 08/22] elevate mount count for extended attributes Dave Hansen
2007-02-09 22:53 ` [PATCH 09/22] mount_is_safe(): add comment Dave Hansen
2007-02-09 22:53 ` [PATCH 10/22] unix_find_other() elevate write count for touch_atime() Dave Hansen
2007-02-09 22:53 ` [PATCH 12/22] elevate write count files are open()ed Dave Hansen
2007-02-13  5:11   ` Andrew Morton
2007-02-13 16:58     ` Dave Hansen
2007-02-13 17:58       ` Andrew Morton
2007-02-14  0:17         ` Dave Hansen
2007-02-09 22:53 ` [PATCH 11/22] elevate write count over calls to vfs_rename() Dave Hansen
2007-02-09 22:53 ` [PATCH 13/22] elevate writer count for do_sys_truncate() Dave Hansen
2007-02-09 22:53 ` [PATCH 14/22] elevate write count for do_utimes() Dave Hansen
2007-02-09 22:53 ` [PATCH 15/22] elevate write count for do_sys_utime() and touch_atime() Dave Hansen
2007-02-09 22:53 ` [PATCH 16/22] sys_mknodat(): elevate write count for vfs_mknod/create() Dave Hansen
2007-02-09 22:53 ` [PATCH 17/22] elevate mnt writers for vfs_unlink() callers Dave Hansen
2007-02-09 22:53 ` [PATCH 18/22] do_rmdir(): elevate write count Dave Hansen
2007-02-09 22:53 ` [PATCH 19/22] elevate writer count for custom struct_file Dave Hansen
2007-02-09 22:53 ` [PATCH 20/22] [PATCH] gfs: check nlink count Dave Hansen
2007-02-09 22:53 ` [PATCH 21/22] honor r/w changes at do_remount() time Dave Hansen
2007-02-09 23:22   ` Andrew Morton
2007-02-10  0:00     ` Dave Hansen
2007-02-10  0:29     ` Anton Altaparmakov
2007-02-10  9:54     ` Jan Engelhardt
2007-02-09 22:53 ` [PATCH 22/22] kill open files traverse on remount ro Dave Hansen
2007-02-09 23:18 ` [PATCH 01/22] filesystem helpers for custom 'struct file's Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070209225331.4457B8CB@localhost.localdomain \
    --to=hansendc@us.ibm.com \
    --cc=akpm@osdl.org \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --subject='Re: [PATCH 03/22] record when sb_writer_count elevated for inode' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).