Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC PATCH] vfs: remove lockdep bogosity in __sb_start_write
@ 2020-11-03 17:33 Darrick J. Wong
2020-11-03 17:39 ` Christoph Hellwig
2020-11-03 18:33 ` [RFC PATCH 2/2] vfs: separate __sb_start_write into blocking and non-blocking helpers Darrick J. Wong
0 siblings, 2 replies; 9+ messages in thread
From: Darrick J. Wong @ 2020-11-03 17:33 UTC (permalink / raw)
To: xfs, linux-fsdevel; +Cc: Dave Chinner, fdmanana
From: Darrick J. Wong <darrick.wong@oracle.com>
__sb_start_write has some weird looking lockdep code that claims to
exist to handle nested freeze locking requests from xfs. The code as
written seems broken -- if we think we hold a read lock on any of the
higher freeze levels (e.g. we hold SB_FREEZE_WRITE and are trying to
lock SB_FREEZE_PAGEFAULT), it converts a blocking lock attempt into a
trylock.
However, it's not correct to downgrade a blocking lock attempt to a
trylock unless the downgrading code or the callers are prepared to deal
with that situation. Neither __sb_start_write nor its callers handle
this at all. For example:
sb_start_pagefault ignores the return value completely, with the result
that if xfs_filemap_fault loses a race with a different thread trying to
fsfreeze, it will proceed without pagefault freeze protection (thereby
breaking locking rules) and then unlocks the pagefault freeze lock that
it doesn't own on its way out (thereby corrupting the lock state), which
leads to a system hang shortly afterwards.
Normally, this won't happen because our ownership of a read lock on a
higher freeze protection level blocks fsfreeze from grabbing a write
lock on that higher level. *However*, if lockdep is offline,
lock_is_held_type unconditionally returns 1, which means that
percpu_rwsem_is_held returns 1, which means that __sb_start_write
unconditionally converts blocking freeze lock attempts into trylocks,
even when we *don't* hold anything that would block a fsfreeze.
Apparently this all held together until 5.10-rc1, when bugs in lockdep
caused lockdep to shut itself off early in an fstests run, and once
fstests gets to the "race writes with freezer" tests, kaboom. This
might explain the long trail of vanishingly infrequent livelocks in
fstests after lockdep goes offline that I've never been able to
diagnose.
We could fix it by spinning on the trylock if wait==true, but AFAICT the
locking works fine if lockdep is not built at all (and I didn't see any
complaints running fstests overnight), so remove this snippet entirely.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/super.c | 33 ++++-----------------------------
1 file changed, 4 insertions(+), 29 deletions(-)
diff --git a/fs/super.c b/fs/super.c
index a51c2083cd6b..e1fd667454d4 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1647,36 +1647,11 @@ EXPORT_SYMBOL(__sb_end_write);
*/
int __sb_start_write(struct super_block *sb, int level, bool wait)
{
- bool force_trylock = false;
- int ret = 1;
+ if (!wait)
+ return percpu_down_read_trylock(sb->s_writers.rw_sem + level-1);
-#ifdef CONFIG_LOCKDEP
- /*
- * We want lockdep to tell us about possible deadlocks with freezing
- * but it's it bit tricky to properly instrument it. Getting a freeze
- * protection works as getting a read lock but there are subtle
- * problems. XFS for example gets freeze protection on internal level
- * twice in some cases, which is OK only because we already hold a
- * freeze protection also on higher level. Due to these cases we have
- * to use wait == F (trylock mode) which must not fail.
- */
- if (wait) {
- int i;
-
- for (i = 0; i < level - 1; i++)
- if (percpu_rwsem_is_held(sb->s_writers.rw_sem + i)) {
- force_trylock = true;
- break;
- }
- }
-#endif
- if (wait && !force_trylock)
- percpu_down_read(sb->s_writers.rw_sem + level-1);
- else
- ret = percpu_down_read_trylock(sb->s_writers.rw_sem + level-1);
-
- WARN_ON(force_trylock && !ret);
- return ret;
+ percpu_down_read(sb->s_writers.rw_sem + level-1);
+ return 1;
}
EXPORT_SYMBOL(__sb_start_write);
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] vfs: remove lockdep bogosity in __sb_start_write
2020-11-03 17:33 [RFC PATCH] vfs: remove lockdep bogosity in __sb_start_write Darrick J. Wong
@ 2020-11-03 17:39 ` Christoph Hellwig
2020-11-03 18:34 ` Darrick J. Wong
2020-11-03 18:33 ` [RFC PATCH 2/2] vfs: separate __sb_start_write into blocking and non-blocking helpers Darrick J. Wong
1 sibling, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2020-11-03 17:39 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: xfs, linux-fsdevel, Dave Chinner, fdmanana
> int __sb_start_write(struct super_block *sb, int level, bool wait)
> {
> - int ret = 1;
> + if (!wait)
> + return percpu_down_read_trylock(sb->s_writers.rw_sem + level-1);
>
> + percpu_down_read(sb->s_writers.rw_sem + level-1);
> + return 1;
> }
> EXPORT_SYMBOL(__sb_start_write);
Please split the function into __sb_start_write and
__sb_start_write_trylock while you're at it..
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH 2/2] vfs: separate __sb_start_write into blocking and non-blocking helpers
2020-11-03 17:33 [RFC PATCH] vfs: remove lockdep bogosity in __sb_start_write Darrick J. Wong
2020-11-03 17:39 ` Christoph Hellwig
@ 2020-11-03 18:33 ` Darrick J. Wong
1 sibling, 0 replies; 9+ messages in thread
From: Darrick J. Wong @ 2020-11-03 18:33 UTC (permalink / raw)
To: xfs, linux-fsdevel; +Cc: Dave Chinner, fdmanana, Christoph Hellwig
From: Darrick J. Wong <darrick.wong@oracle.com>
Break this function into two helpers so that it's obvious that the
trylock versions return a value that must be checked, and the blocking
versions don't require that. While we're at it, clean up the return
type mismatches.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/aio.c | 2 +-
fs/io_uring.c | 3 +--
fs/super.c | 18 ++++++++++++------
include/linux/fs.h | 21 +++++++++++----------
4 files changed, 25 insertions(+), 19 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index c45c20d87538..04bb4bac327f 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1572,7 +1572,7 @@ static int aio_write(struct kiocb *req, const struct iocb *iocb,
* we return to userspace.
*/
if (S_ISREG(file_inode(file)->i_mode)) {
- __sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE, true);
+ __sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE);
__sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
}
req->ki_flags |= IOCB_WRITE;
diff --git a/fs/io_uring.c b/fs/io_uring.c
index b42dfa0243bf..b735fecb49da 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3532,8 +3532,7 @@ static int io_write(struct io_kiocb *req, bool force_nonblock,
* we return to userspace.
*/
if (req->flags & REQ_F_ISREG) {
- __sb_start_write(file_inode(req->file)->i_sb,
- SB_FREEZE_WRITE, true);
+ __sb_start_write(file_inode(req->file)->i_sb, SB_FREEZE_WRITE);
__sb_writers_release(file_inode(req->file)->i_sb,
SB_FREEZE_WRITE);
}
diff --git a/fs/super.c b/fs/super.c
index e1fd667454d4..59aa59279133 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1645,16 +1645,22 @@ EXPORT_SYMBOL(__sb_end_write);
* This is an internal function, please use sb_start_{write,pagefault,intwrite}
* instead.
*/
-int __sb_start_write(struct super_block *sb, int level, bool wait)
+void __sb_start_write(struct super_block *sb, int level)
{
- if (!wait)
- return percpu_down_read_trylock(sb->s_writers.rw_sem + level-1);
-
- percpu_down_read(sb->s_writers.rw_sem + level-1);
- return 1;
+ percpu_down_read(sb->s_writers.rw_sem + level - 1);
}
EXPORT_SYMBOL(__sb_start_write);
+/*
+ * This is an internal function, please use sb_start_{write,pagefault,intwrite}
+ * instead.
+ */
+bool __sb_start_write_trylock(struct super_block *sb, int level)
+{
+ return percpu_down_read_trylock(sb->s_writers.rw_sem + level - 1);
+}
+EXPORT_SYMBOL_GPL(__sb_start_write_trylock);
+
/**
* sb_wait_write - wait until all writers to given file system finish
* @sb: the super for which we wait
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0bd126418bb6..98548ec7de0b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1581,7 +1581,8 @@ extern struct timespec64 current_time(struct inode *inode);
*/
void __sb_end_write(struct super_block *sb, int level);
-int __sb_start_write(struct super_block *sb, int level, bool wait);
+void __sb_start_write(struct super_block *sb, int level);
+bool __sb_start_write_trylock(struct super_block *sb, int level);
#define __sb_writers_acquired(sb, lev) \
percpu_rwsem_acquire(&(sb)->s_writers.rw_sem[(lev)-1], 1, _THIS_IP_)
@@ -1645,12 +1646,12 @@ static inline void sb_end_intwrite(struct super_block *sb)
*/
static inline void sb_start_write(struct super_block *sb)
{
- __sb_start_write(sb, SB_FREEZE_WRITE, true);
+ __sb_start_write(sb, SB_FREEZE_WRITE);
}
-static inline int sb_start_write_trylock(struct super_block *sb)
+static inline bool sb_start_write_trylock(struct super_block *sb)
{
- return __sb_start_write(sb, SB_FREEZE_WRITE, false);
+ return __sb_start_write_trylock(sb, SB_FREEZE_WRITE);
}
/**
@@ -1674,7 +1675,7 @@ static inline int sb_start_write_trylock(struct super_block *sb)
*/
static inline void sb_start_pagefault(struct super_block *sb)
{
- __sb_start_write(sb, SB_FREEZE_PAGEFAULT, true);
+ __sb_start_write(sb, SB_FREEZE_PAGEFAULT);
}
/*
@@ -1692,12 +1693,12 @@ static inline void sb_start_pagefault(struct super_block *sb)
*/
static inline void sb_start_intwrite(struct super_block *sb)
{
- __sb_start_write(sb, SB_FREEZE_FS, true);
+ __sb_start_write(sb, SB_FREEZE_FS);
}
-static inline int sb_start_intwrite_trylock(struct super_block *sb)
+static inline bool sb_start_intwrite_trylock(struct super_block *sb)
{
- return __sb_start_write(sb, SB_FREEZE_FS, false);
+ return __sb_start_write_trylock(sb, SB_FREEZE_FS);
}
@@ -2756,14 +2757,14 @@ static inline void file_start_write(struct file *file)
{
if (!S_ISREG(file_inode(file)->i_mode))
return;
- __sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE, true);
+ __sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE);
}
static inline bool file_start_write_trylock(struct file *file)
{
if (!S_ISREG(file_inode(file)->i_mode))
return true;
- return __sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE, false);
+ return __sb_start_write_trylock(file_inode(file)->i_sb, SB_FREEZE_WRITE);
}
static inline void file_end_write(struct file *file)
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] vfs: remove lockdep bogosity in __sb_start_write
2020-11-03 17:39 ` Christoph Hellwig
@ 2020-11-03 18:34 ` Darrick J. Wong
2020-11-03 18:46 ` Christoph Hellwig
0 siblings, 1 reply; 9+ messages in thread
From: Darrick J. Wong @ 2020-11-03 18:34 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs, linux-fsdevel, Dave Chinner, fdmanana
On Tue, Nov 03, 2020 at 05:39:21PM +0000, Christoph Hellwig wrote:
> > int __sb_start_write(struct super_block *sb, int level, bool wait)
> > {
> > - int ret = 1;
> > + if (!wait)
> > + return percpu_down_read_trylock(sb->s_writers.rw_sem + level-1);
> >
> > + percpu_down_read(sb->s_writers.rw_sem + level-1);
> > + return 1;
> > }
> > EXPORT_SYMBOL(__sb_start_write);
>
> Please split the function into __sb_start_write and
> __sb_start_write_trylock while you're at it..
Any thoughts on this patch itself? I don't feel like I have 100% of the
context to know whether the removal is a good idea for non-xfs
filesystems, though I'm fairly sure the current logic is broken.
(Sending a second cleanup patch...)
--D
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] vfs: remove lockdep bogosity in __sb_start_write
2020-11-03 18:34 ` Darrick J. Wong
@ 2020-11-03 18:46 ` Christoph Hellwig
2020-11-03 19:37 ` Darrick J. Wong
0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2020-11-03 18:46 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Christoph Hellwig, xfs, linux-fsdevel, Dave Chinner, fdmanana
On Tue, Nov 03, 2020 at 10:34:44AM -0800, Darrick J. Wong wrote:
> > Please split the function into __sb_start_write and
> > __sb_start_write_trylock while you're at it..
>
> Any thoughts on this patch itself? I don't feel like I have 100% of the
> context to know whether the removal is a good idea for non-xfs
> filesystems, though I'm fairly sure the current logic is broken.
The existing logic looks pretty bogus to me as well. Did you try to find
the discussion that lead to it?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] vfs: remove lockdep bogosity in __sb_start_write
2020-11-03 18:46 ` Christoph Hellwig
@ 2020-11-03 19:37 ` Darrick J. Wong
2020-11-05 21:34 ` Dave Chinner
0 siblings, 1 reply; 9+ messages in thread
From: Darrick J. Wong @ 2020-11-03 19:37 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs, linux-fsdevel, Dave Chinner, fdmanana
On Tue, Nov 03, 2020 at 06:46:59PM +0000, Christoph Hellwig wrote:
> On Tue, Nov 03, 2020 at 10:34:44AM -0800, Darrick J. Wong wrote:
> > > Please split the function into __sb_start_write and
> > > __sb_start_write_trylock while you're at it..
> >
> > Any thoughts on this patch itself? I don't feel like I have 100% of the
> > context to know whether the removal is a good idea for non-xfs
> > filesystems, though I'm fairly sure the current logic is broken.
>
> The existing logic looks pretty bogus to me as well. Did you try to find
> the discussion that lead to it?
TBH I don't know where the discussion happened. The "convert to
trylock" behavior first appeared as commit 5accdf82ba25c back in 2012;
that commit seems to have come from v6 of a patch[1] that Jan Kara sent
to try to fix fs freeze handling back in 2012. The behavior was not in
the v5[0] patch, nor was there any discussion for any of the v5 patches
that would suggest why things changed from v5 to v6.
Dave and I were talking about this on IRC yesterday, and his memory
thought that this was lockdep trying to handle xfs taking intwrite
protection while handling a write (or page_mkwrite) operation. I'm not
sure where "XFS for example gets freeze protection on internal level
twice in some cases" would actually happen -- did xfs support nested
transactions in the past? We definitely don't now, so I don't think the
comment is valid anymore.
The last commit to touch this area was f4b554af9931 (in 2015), which
says that Dave explained that the trylock hack + comment could be
removed, but the patch author never did that, and lore doesn't seem to
know where or when Dave actually said that?
FWIW I couldn't find a place where we could take the freeze locks in the
wrong order (or multiple times) -- I'm pretty sure the file remap stuff
(clone/dedupe/copyrange) all grab write freeze protection; I couldn't
figure out how you could start with pagefault freeze protection and then
need write freeze protection; and AFAIK none of the filesystems that
even care about intwrite will try to grab it recursively. Most of them
don't allow nested transactions, and the one that does (nilfs2?) appears
to be careful not to nest.
--D
[0] https://lore.kernel.org/linux-fsdevel/1334592845-22862-14-git-send-email-jack@suse.cz/
[1] https://lore.kernel.org/linux-fsdevel/1338589841-9568-14-git-send-email-jack@suse.cz/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] vfs: remove lockdep bogosity in __sb_start_write
2020-11-03 19:37 ` Darrick J. Wong
@ 2020-11-05 21:34 ` Dave Chinner
2020-11-06 2:19 ` Darrick J. Wong
0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2020-11-05 21:34 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Christoph Hellwig, xfs, linux-fsdevel, fdmanana
On Tue, Nov 03, 2020 at 11:37:50AM -0800, Darrick J. Wong wrote:
> On Tue, Nov 03, 2020 at 06:46:59PM +0000, Christoph Hellwig wrote:
> > On Tue, Nov 03, 2020 at 10:34:44AM -0800, Darrick J. Wong wrote:
> > > > Please split the function into __sb_start_write and
> > > > __sb_start_write_trylock while you're at it..
> > >
> > > Any thoughts on this patch itself? I don't feel like I have 100% of the
> > > context to know whether the removal is a good idea for non-xfs
> > > filesystems, though I'm fairly sure the current logic is broken.
> >
> > The existing logic looks pretty bogus to me as well. Did you try to find
> > the discussion that lead to it?
>
> TBH I don't know where the discussion happened. The "convert to
> trylock" behavior first appeared as commit 5accdf82ba25c back in 2012;
> that commit seems to have come from v6 of a patch[1] that Jan Kara sent
> to try to fix fs freeze handling back in 2012. The behavior was not in
> the v5[0] patch, nor was there any discussion for any of the v5 patches
> that would suggest why things changed from v5 to v6.
>
> Dave and I were talking about this on IRC yesterday, and his memory
> thought that this was lockdep trying to handle xfs taking intwrite
> protection while handling a write (or page_mkwrite) operation. I'm not
> sure where "XFS for example gets freeze protection on internal level
> twice in some cases" would actually happen -- did xfs support nested
> transactions in the past? We definitely don't now, so I don't think the
> comment is valid anymore.
>
> The last commit to touch this area was f4b554af9931 (in 2015), which
> says that Dave explained that the trylock hack + comment could be
> removed, but the patch author never did that, and lore doesn't seem to
> know where or when Dave actually said that?
I'm pretty sure this "nesting internal freeze references" stems from
the fact we log and flush the superblock after fulling freezing the
filesystem to dirty the journal so recovery after a crash while
frozen handles unlinked inodes.
The high level VFS freeze annotations were not able to handle
running this transaction when transactions were supposed to already
be blocked and drained, so there was a special hack to hide it from
lockdep. Then we ended up hiding it from the VFS via
XFS_TRANS_NO_WRITECOUNT in xfs_sync_sb() because we needed it in
more places than just freeze (e.g. the log covering code
run by the background log worker). It's kinda documented here:
/*
* xfs_sync_sb
*
* Sync the superblock to disk.
*
* Note that the caller is responsible for checking the frozen state of the
* filesystem. This procedure uses the non-blocking transaction allocator and
* thus will allow modifications to a frozen fs. This is required because this
* code can be called during the process of freezing where use of the high-level
* allocator would deadlock.
*/
So, AFAICT, the whole "XFS nests internal transactions" lockdep
handling in __sb_start_write() has been unnecessary for quite a few
years now....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] vfs: remove lockdep bogosity in __sb_start_write
2020-11-05 21:34 ` Dave Chinner
@ 2020-11-06 2:19 ` Darrick J. Wong
2020-11-06 3:32 ` Dave Chinner
0 siblings, 1 reply; 9+ messages in thread
From: Darrick J. Wong @ 2020-11-06 2:19 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, xfs, linux-fsdevel, fdmanana
On Fri, Nov 06, 2020 at 08:34:15AM +1100, Dave Chinner wrote:
> On Tue, Nov 03, 2020 at 11:37:50AM -0800, Darrick J. Wong wrote:
> > On Tue, Nov 03, 2020 at 06:46:59PM +0000, Christoph Hellwig wrote:
> > > On Tue, Nov 03, 2020 at 10:34:44AM -0800, Darrick J. Wong wrote:
> > > > > Please split the function into __sb_start_write and
> > > > > __sb_start_write_trylock while you're at it..
> > > >
> > > > Any thoughts on this patch itself? I don't feel like I have 100% of the
> > > > context to know whether the removal is a good idea for non-xfs
> > > > filesystems, though I'm fairly sure the current logic is broken.
> > >
> > > The existing logic looks pretty bogus to me as well. Did you try to find
> > > the discussion that lead to it?
> >
> > TBH I don't know where the discussion happened. The "convert to
> > trylock" behavior first appeared as commit 5accdf82ba25c back in 2012;
> > that commit seems to have come from v6 of a patch[1] that Jan Kara sent
> > to try to fix fs freeze handling back in 2012. The behavior was not in
> > the v5[0] patch, nor was there any discussion for any of the v5 patches
> > that would suggest why things changed from v5 to v6.
> >
> > Dave and I were talking about this on IRC yesterday, and his memory
> > thought that this was lockdep trying to handle xfs taking intwrite
> > protection while handling a write (or page_mkwrite) operation. I'm not
> > sure where "XFS for example gets freeze protection on internal level
> > twice in some cases" would actually happen -- did xfs support nested
> > transactions in the past? We definitely don't now, so I don't think the
> > comment is valid anymore.
> >
> > The last commit to touch this area was f4b554af9931 (in 2015), which
> > says that Dave explained that the trylock hack + comment could be
> > removed, but the patch author never did that, and lore doesn't seem to
> > know where or when Dave actually said that?
>
> I'm pretty sure this "nesting internal freeze references" stems from
> the fact we log and flush the superblock after fulling freezing the
> filesystem to dirty the journal so recovery after a crash while
> frozen handles unlinked inodes.
>
> The high level VFS freeze annotations were not able to handle
> running this transaction when transactions were supposed to already
> be blocked and drained, so there was a special hack to hide it from
> lockdep. Then we ended up hiding it from the VFS via
> XFS_TRANS_NO_WRITECOUNT in xfs_sync_sb() because we needed it in
> more places than just freeze (e.g. the log covering code
> run by the background log worker). It's kinda documented here:
>
> /*
> * xfs_sync_sb
> *
> * Sync the superblock to disk.
> *
> * Note that the caller is responsible for checking the frozen state of the
> * filesystem. This procedure uses the non-blocking transaction allocator and
> * thus will allow modifications to a frozen fs. This is required because this
> * code can be called during the process of freezing where use of the high-level
> * allocator would deadlock.
> */
>
> So, AFAICT, the whole "XFS nests internal transactions" lockdep
> handling in __sb_start_write() has been unnecessary for quite a few
> years now....
Yeah. Would you be willing to RVB this, or are you all waiting for a v2
series?
--D
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] vfs: remove lockdep bogosity in __sb_start_write
2020-11-06 2:19 ` Darrick J. Wong
@ 2020-11-06 3:32 ` Dave Chinner
0 siblings, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2020-11-06 3:32 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Christoph Hellwig, xfs, linux-fsdevel, fdmanana
On Thu, Nov 05, 2020 at 06:19:51PM -0800, Darrick J. Wong wrote:
> On Fri, Nov 06, 2020 at 08:34:15AM +1100, Dave Chinner wrote:
> > On Tue, Nov 03, 2020 at 11:37:50AM -0800, Darrick J. Wong wrote:
> > > On Tue, Nov 03, 2020 at 06:46:59PM +0000, Christoph Hellwig wrote:
> > > > On Tue, Nov 03, 2020 at 10:34:44AM -0800, Darrick J. Wong wrote:
> > > > > > Please split the function into __sb_start_write and
> > > > > > __sb_start_write_trylock while you're at it..
> > > > >
> > > > > Any thoughts on this patch itself? I don't feel like I have 100% of the
> > > > > context to know whether the removal is a good idea for non-xfs
> > > > > filesystems, though I'm fairly sure the current logic is broken.
> > > >
> > > > The existing logic looks pretty bogus to me as well. Did you try to find
> > > > the discussion that lead to it?
> > >
> > > TBH I don't know where the discussion happened. The "convert to
> > > trylock" behavior first appeared as commit 5accdf82ba25c back in 2012;
> > > that commit seems to have come from v6 of a patch[1] that Jan Kara sent
> > > to try to fix fs freeze handling back in 2012. The behavior was not in
> > > the v5[0] patch, nor was there any discussion for any of the v5 patches
> > > that would suggest why things changed from v5 to v6.
> > >
> > > Dave and I were talking about this on IRC yesterday, and his memory
> > > thought that this was lockdep trying to handle xfs taking intwrite
> > > protection while handling a write (or page_mkwrite) operation. I'm not
> > > sure where "XFS for example gets freeze protection on internal level
> > > twice in some cases" would actually happen -- did xfs support nested
> > > transactions in the past? We definitely don't now, so I don't think the
> > > comment is valid anymore.
> > >
> > > The last commit to touch this area was f4b554af9931 (in 2015), which
> > > says that Dave explained that the trylock hack + comment could be
> > > removed, but the patch author never did that, and lore doesn't seem to
> > > know where or when Dave actually said that?
> >
> > I'm pretty sure this "nesting internal freeze references" stems from
> > the fact we log and flush the superblock after fulling freezing the
> > filesystem to dirty the journal so recovery after a crash while
> > frozen handles unlinked inodes.
> >
> > The high level VFS freeze annotations were not able to handle
> > running this transaction when transactions were supposed to already
> > be blocked and drained, so there was a special hack to hide it from
> > lockdep. Then we ended up hiding it from the VFS via
> > XFS_TRANS_NO_WRITECOUNT in xfs_sync_sb() because we needed it in
> > more places than just freeze (e.g. the log covering code
> > run by the background log worker). It's kinda documented here:
> >
> > /*
> > * xfs_sync_sb
> > *
> > * Sync the superblock to disk.
> > *
> > * Note that the caller is responsible for checking the frozen state of the
> > * filesystem. This procedure uses the non-blocking transaction allocator and
> > * thus will allow modifications to a frozen fs. This is required because this
> > * code can be called during the process of freezing where use of the high-level
> > * allocator would deadlock.
> > */
> >
> > So, AFAICT, the whole "XFS nests internal transactions" lockdep
> > handling in __sb_start_write() has been unnecessary for quite a few
> > years now....
>
> Yeah. Would you be willing to RVB this, or are you all waiting for a v2
> series?
Send out a v2 - you probably need to include some of the above
information in the change log removing the lockdep stuff so it's
preserved this time...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-11-06 3:32 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-03 17:33 [RFC PATCH] vfs: remove lockdep bogosity in __sb_start_write Darrick J. Wong
2020-11-03 17:39 ` Christoph Hellwig
2020-11-03 18:34 ` Darrick J. Wong
2020-11-03 18:46 ` Christoph Hellwig
2020-11-03 19:37 ` Darrick J. Wong
2020-11-05 21:34 ` Dave Chinner
2020-11-06 2:19 ` Darrick J. Wong
2020-11-06 3:32 ` Dave Chinner
2020-11-03 18:33 ` [RFC PATCH 2/2] vfs: separate __sb_start_write into blocking and non-blocking helpers Darrick J. Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).