LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH V4 0/5] ext4: get discard out of jbd2 commit kthread
@ 2021-08-30  7:52 Wang Jianchao
  2021-08-30  7:52 ` [PATCH V4 1/5] ext4: remove the 'group' parameter of ext4_trim_extent Wang Jianchao
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Wang Jianchao @ 2021-08-30  7:52 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: jack, guoqing.jiang, linux-kernel, linux-ext4

Hi all

This is the version 4 patch set that attempts to get discard out of the jbd2
commit kthread. When the user delete a lot data and cause discard flooding,
the jbd2 commit kthread can be blocked for very long time and then all of
the metadata operations are blocked due to no journal space.

The xfstest with following parameters,
MODULAR=0
TEST_DIR=/mnt/test
TEST_DEV=/dev/nbd37p1
SCRATCH_MNT=/mnt/scratch
SCRATCH_DEV=/dev/nbd37p2
MOUNT_OPTIONS="-o discard"
has passed. The result is consistent w/ or w/o this patch set.

There are 5 patches,

Patch 1 ~ 3, there are no functional changes in them, but just some preparation
for following patches

Patch 4 introduces a async kworker to do discard in fstrim fation which implements
the core idea of this patch set.

Patch 5 flush discard background work in ext4_should_retry_alloc, This fix the generic/371

Any comments are welcome ;)

V3 -> V4:
 - In patch 1, avoid modify two lines in patch 1 when remove 'grou'p parameter
 - In patch 4, remove redunbant queue_work() in ext4_mb_release(). And queue background
   discard work to system_unbound_wq as it is not a urgent task.
 - In patch 5, flush discard kwork in ext4_should_retry_alloc(), instead of invoke
   ext4_should_retry_alloc in fallocate again.
   
V2 -> V3
 - Get rid of the per block group rb tree which carries freed entry. It is not neccesary
   because we have done aggregation when wait for journal commit. Just use a list
   to carry the free entries.

V1 -> V2
 - free the blocks back to mb buddy after commit and then do ftrim fashion discard

 fs/ext4/balloc.c  |   8 ++-
 fs/ext4/ext4.h    |   3 ++
 fs/ext4/mballoc.c | 216 +++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------
 3 files changed, 148 insertions(+), 79 deletions(-)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH V4 1/5] ext4: remove the 'group' parameter of ext4_trim_extent
  2021-08-30  7:52 [PATCH V4 0/5] ext4: get discard out of jbd2 commit kthread Wang Jianchao
@ 2021-08-30  7:52 ` Wang Jianchao
  2021-08-30  7:52 ` [PATCH V4 2/5] ext4: add new helper interface ext4_try_to_trim_range() Wang Jianchao
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Wang Jianchao @ 2021-08-30  7:52 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: jack, guoqing.jiang, linux-kernel, linux-ext4

From: Wang Jianchao <wangjianchao@kuaishou.com>

Get rid of the 'group' parameter of ext4_trim_extent as we can get
it from the 'e4b'.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
---
 fs/ext4/mballoc.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 089c958aa2c3..09fb0223afaa 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6183,7 +6183,6 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
  * @sb:		super block for the file system
  * @start:	starting block of the free extent in the alloc. group
  * @count:	number of blocks to TRIM
- * @group:	alloc. group we are working with
  * @e4b:	ext4 buddy for the group
  *
  * Trim "count" blocks starting at "start" in the "group". To assure that no
@@ -6191,11 +6190,12 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
  * be called with under the group lock.
  */
 static int ext4_trim_extent(struct super_block *sb, int start, int count,
-			     ext4_group_t group, struct ext4_buddy *e4b)
+			struct ext4_buddy *e4b)
 __releases(bitlock)
 __acquires(bitlock)
 {
 	struct ext4_free_extent ex;
+	ext4_group_t group = e4b->bd_group;
 	int ret = 0;
 
 	trace_ext4_trim_extent(sb, group, start, count);
@@ -6271,8 +6271,7 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
 		next = mb_find_next_bit(bitmap, max + 1, start);
 
 		if ((next - start) >= minblocks) {
-			ret = ext4_trim_extent(sb, start,
-					       next - start, group, &e4b);
+			ret = ext4_trim_extent(sb, start, next - start, &e4b);
 			if (ret && ret != -EOPNOTSUPP)
 				break;
 			ret = 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH V4 2/5] ext4: add new helper interface ext4_try_to_trim_range()
  2021-08-30  7:52 [PATCH V4 0/5] ext4: get discard out of jbd2 commit kthread Wang Jianchao
  2021-08-30  7:52 ` [PATCH V4 1/5] ext4: remove the 'group' parameter of ext4_trim_extent Wang Jianchao
@ 2021-08-30  7:52 ` Wang Jianchao
  2021-08-30  7:52 ` [PATCH V4 3/5] ext4: remove the repeated comment of ext4_trim_all_free Wang Jianchao
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Wang Jianchao @ 2021-08-30  7:52 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: jack, guoqing.jiang, linux-kernel, linux-ext4

From: Wang Jianchao <wangjianchao@kuaishou.com>

There is no functional change in this patch but just split the
codes, which serachs free block and does trim, into a new function
ext4_try_to_trim_range. This is preparing for the following async
backgroup discard.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
---
 fs/ext4/mballoc.c | 102 ++++++++++++++++++++++++++--------------------
 1 file changed, 57 insertions(+), 45 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 09fb0223afaa..e47089cc6c07 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6218,6 +6218,54 @@ __acquires(bitlock)
 	return ret;
 }
 
+static int ext4_try_to_trim_range(struct super_block *sb,
+		struct ext4_buddy *e4b, ext4_grpblk_t start,
+		ext4_grpblk_t max, ext4_grpblk_t minblocks)
+{
+	ext4_grpblk_t next, count, free_count;
+	void *bitmap;
+	int ret = 0;
+
+	bitmap = e4b->bd_bitmap;
+	start = (e4b->bd_info->bb_first_free > start) ?
+		e4b->bd_info->bb_first_free : start;
+	count = 0;
+	free_count = 0;
+
+	while (start <= max) {
+		start = mb_find_next_zero_bit(bitmap, max + 1, start);
+		if (start > max)
+			break;
+		next = mb_find_next_bit(bitmap, max + 1, start);
+
+		if ((next - start) >= minblocks) {
+			ret = ext4_trim_extent(sb, start, next - start, e4b);
+			if (ret && ret != -EOPNOTSUPP)
+				break;
+			ret = 0;
+			count += next - start;
+		}
+		free_count += next - start;
+		start = next + 1;
+
+		if (fatal_signal_pending(current)) {
+			count = -ERESTARTSYS;
+			break;
+		}
+
+		if (need_resched()) {
+			ext4_unlock_group(sb, e4b->bd_group);
+			cond_resched();
+			ext4_lock_group(sb, e4b->bd_group);
+		}
+
+		if ((e4b->bd_info->bb_free - free_count) < minblocks)
+			break;
+	}
+
+	return count;
+}
+
 /**
  * ext4_trim_all_free -- function to trim all free space in alloc. group
  * @sb:			super block for file system
@@ -6241,10 +6289,8 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
 		   ext4_grpblk_t start, ext4_grpblk_t max,
 		   ext4_grpblk_t minblocks)
 {
-	void *bitmap;
-	ext4_grpblk_t next, count = 0, free_count = 0;
 	struct ext4_buddy e4b;
-	int ret = 0;
+	int ret;
 
 	trace_ext4_trim_all_free(sb, group, start, max);
 
@@ -6254,57 +6300,23 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
 			     ret, group);
 		return ret;
 	}
-	bitmap = e4b.bd_bitmap;
 
 	ext4_lock_group(sb, group);
-	if (EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) &&
-	    minblocks >= atomic_read(&EXT4_SB(sb)->s_last_trim_minblks))
-		goto out;
-
-	start = (e4b.bd_info->bb_first_free > start) ?
-		e4b.bd_info->bb_first_free : start;
 
-	while (start <= max) {
-		start = mb_find_next_zero_bit(bitmap, max + 1, start);
-		if (start > max)
-			break;
-		next = mb_find_next_bit(bitmap, max + 1, start);
-
-		if ((next - start) >= minblocks) {
-			ret = ext4_trim_extent(sb, start, next - start, &e4b);
-			if (ret && ret != -EOPNOTSUPP)
-				break;
-			ret = 0;
-			count += next - start;
-		}
-		free_count += next - start;
-		start = next + 1;
-
-		if (fatal_signal_pending(current)) {
-			count = -ERESTARTSYS;
-			break;
-		}
-
-		if (need_resched()) {
-			ext4_unlock_group(sb, group);
-			cond_resched();
-			ext4_lock_group(sb, group);
-		}
-
-		if ((e4b.bd_info->bb_free - free_count) < minblocks)
-			break;
+	if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
+	    minblocks < atomic_read(&EXT4_SB(sb)->s_last_trim_minblks)) {
+		ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
+		if (ret >= 0)
+			EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
+	} else {
+		ret = 0;
 	}
 
-	if (!ret) {
-		ret = count;
-		EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
-	}
-out:
 	ext4_unlock_group(sb, group);
 	ext4_mb_unload_buddy(&e4b);
 
 	ext4_debug("trimmed %d blocks in the group %d\n",
-		count, group);
+		ret, group);
 
 	return ret;
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH V4 3/5] ext4: remove the repeated comment of ext4_trim_all_free
  2021-08-30  7:52 [PATCH V4 0/5] ext4: get discard out of jbd2 commit kthread Wang Jianchao
  2021-08-30  7:52 ` [PATCH V4 1/5] ext4: remove the 'group' parameter of ext4_trim_extent Wang Jianchao
  2021-08-30  7:52 ` [PATCH V4 2/5] ext4: add new helper interface ext4_try_to_trim_range() Wang Jianchao
@ 2021-08-30  7:52 ` Wang Jianchao
  2021-08-30  7:52 ` [PATCH V4 4/5] ext4: get discard out of jbd2 commit kthread contex Wang Jianchao
  2021-08-30  7:52 ` [PATCH V4 5/5] ext4: flush background discard kwork when retry allocation Wang Jianchao
  4 siblings, 0 replies; 7+ messages in thread
From: Wang Jianchao @ 2021-08-30  7:52 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: jack, guoqing.jiang, linux-kernel, linux-ext4

From: Wang Jianchao <wangjianchao@kuaishou.com>

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
---
 fs/ext4/mballoc.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index e47089cc6c07..ef2aa2b1a939 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6274,15 +6274,10 @@ static int ext4_try_to_trim_range(struct super_block *sb,
  * @max:		last group block to examine
  * @minblocks:		minimum extent block count
  *
- * ext4_trim_all_free walks through group's buddy bitmap searching for free
- * extents. When the free block is found, ext4_trim_extent is called to TRIM
- * the extent.
- *
- *
  * ext4_trim_all_free walks through group's block bitmap searching for free
  * extents. When the free extent is found, mark it as used in group buddy
  * bitmap. Then issue a TRIM command on this extent and free the extent in
- * the group buddy bitmap. This is done until whole group is scanned.
+ * the group buddy bitmap.
  */
 static ext4_grpblk_t
 ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH V4 4/5] ext4: get discard out of jbd2 commit kthread contex
  2021-08-30  7:52 [PATCH V4 0/5] ext4: get discard out of jbd2 commit kthread Wang Jianchao
                   ` (2 preceding siblings ...)
  2021-08-30  7:52 ` [PATCH V4 3/5] ext4: remove the repeated comment of ext4_trim_all_free Wang Jianchao
@ 2021-08-30  7:52 ` Wang Jianchao
  2021-08-31  4:00   ` Theodore Ts'o
  2021-08-30  7:52 ` [PATCH V4 5/5] ext4: flush background discard kwork when retry allocation Wang Jianchao
  4 siblings, 1 reply; 7+ messages in thread
From: Wang Jianchao @ 2021-08-30  7:52 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: jack, guoqing.jiang, linux-kernel, linux-ext4

From: Wang Jianchao <wangjianchao@kuaishou.com>

Right now, discard is issued and waited to be completed in jbd2
commit kthread context after the logs are committed. When large
amount of files are deleted and discard is flooding, jbd2 commit
kthread can be blocked for long time. Then all of the metadata
operations can be blocked to wait the log space.

One case is the page fault path with read mm->mmap_sem held, which
wants to update the file time but has to wait for the log space.
When other threads in the task wants to do mmap, then write mmap_sem
is blocked. Finally all of the following read mmap_sem requirements
are blocked, even the ps command which need to read the /proc/pid/
-cmdline. Our monitor service which needs to read /proc/pid/cmdline
used to be blocked for 5 mins.

This patch frees the blocks back to buddy after commit and then do
discard in a async kworker context in fstrim fashion, namely,
 - mark blocks to be discarded as used if they have not been allocated
 - do discard
 - mark them free
After this, jbd2 commit kthread won't be blocked any more by discard
and we won't get NOSPC even if the discard is slow or throttled.

Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2
Suggested-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
---
 fs/ext4/ext4.h    |   2 +
 fs/ext4/mballoc.c | 101 ++++++++++++++++++++++++++++++++++------------
 2 files changed, 78 insertions(+), 25 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 3c51e243450d..6b678b968d84 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1536,6 +1536,8 @@ struct ext4_sb_info {
 	unsigned int s_mb_free_pending;
 	struct list_head s_freed_data_list;	/* List of blocks to be freed
 						   after commit completed */
+	struct list_head s_discard_list;
+	struct work_struct s_discard_work;
 	struct rb_root s_mb_avg_fragment_size_root;
 	rwlock_t s_mb_rb_lock;
 	struct list_head *s_mb_largest_free_orders;
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index ef2aa2b1a939..259822fc0ae9 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -408,6 +408,10 @@ static void ext4_mb_new_preallocation(struct ext4_allocation_context *ac);
 static bool ext4_mb_good_group(struct ext4_allocation_context *ac,
 			       ext4_group_t group, int cr);
 
+static int ext4_try_to_trim_range(struct super_block *sb,
+		struct ext4_buddy *e4b, ext4_grpblk_t start,
+		ext4_grpblk_t max, ext4_grpblk_t minblocks);
+
 /*
  * The algorithm using this percpu seq counter goes below:
  * 1. We sample the percpu discard_pa_seq counter before trying for block
@@ -3308,6 +3312,55 @@ static int ext4_groupinfo_create_slab(size_t size)
 	return 0;
 }
 
+static void ext4_discard_work(struct work_struct *work)
+{
+	struct ext4_sb_info *sbi = container_of(work,
+			struct ext4_sb_info, s_discard_work);
+	struct super_block *sb = sbi->s_sb;
+	struct ext4_free_data *fd, *nfd;
+	struct ext4_buddy e4b;
+	struct list_head discard_list;
+	ext4_group_t grp, load_grp;
+	int err = 0;
+
+	INIT_LIST_HEAD(&discard_list);
+	spin_lock(&sbi->s_md_lock);
+	list_splice_init(&sbi->s_discard_list, &discard_list);
+	spin_unlock(&sbi->s_md_lock);
+
+	load_grp = UINT_MAX;
+	list_for_each_entry_safe(fd, nfd, &discard_list, efd_list) {
+		/*
+		 * If filesystem is umounting or no memory, give up the discard
+		 */
+		if ((sb->s_flags & SB_ACTIVE) && !err) {
+			grp = fd->efd_group;
+			if (grp != load_grp) {
+				if (load_grp != UINT_MAX)
+					ext4_mb_unload_buddy(&e4b);
+
+				err = ext4_mb_load_buddy(sb, grp, &e4b);
+				if (err) {
+					kmem_cache_free(ext4_free_data_cachep, fd);
+					load_grp = UINT_MAX;
+					continue;
+				} else {
+					load_grp = grp;
+				}
+			}
+
+			ext4_lock_group(sb, grp);
+			ext4_try_to_trim_range(sb, &e4b, fd->efd_start_cluster,
+						fd->efd_start_cluster + fd->efd_count - 1, 1);
+			ext4_unlock_group(sb, grp);
+		}
+		kmem_cache_free(ext4_free_data_cachep, fd);
+	}
+
+	if (load_grp != UINT_MAX)
+		ext4_mb_unload_buddy(&e4b);
+}
+
 int ext4_mb_init(struct super_block *sb)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
@@ -3376,6 +3429,8 @@ int ext4_mb_init(struct super_block *sb)
 	spin_lock_init(&sbi->s_md_lock);
 	sbi->s_mb_free_pending = 0;
 	INIT_LIST_HEAD(&sbi->s_freed_data_list);
+	INIT_LIST_HEAD(&sbi->s_discard_list);
+	INIT_WORK(&sbi->s_discard_work, ext4_discard_work);
 
 	sbi->s_mb_max_to_scan = MB_DEFAULT_MAX_TO_SCAN;
 	sbi->s_mb_min_to_scan = MB_DEFAULT_MIN_TO_SCAN;
@@ -3474,6 +3529,14 @@ int ext4_mb_release(struct super_block *sb)
 	struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
 	int count;
 
+	if (test_opt(sb, DISCARD)) {
+		/*
+		 * wait the discard work to drain all of ext4_free_data
+		 */
+		flush_work(&sbi->s_discard_work);
+		WARN_ON_ONCE(!list_empty(&sbi->s_discard_list));
+	}
+
 	if (sbi->s_group_info) {
 		for (i = 0; i < ngroups; i++) {
 			cond_resched();
@@ -3596,7 +3659,6 @@ static void ext4_free_data_in_buddy(struct super_block *sb,
 		put_page(e4b.bd_bitmap_page);
 	}
 	ext4_unlock_group(sb, entry->efd_group);
-	kmem_cache_free(ext4_free_data_cachep, entry);
 	ext4_mb_unload_buddy(&e4b);
 
 	mb_debug(sb, "freed %d blocks in %d structures\n", count,
@@ -3611,10 +3673,9 @@ void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	struct ext4_free_data *entry, *tmp;
-	struct bio *discard_bio = NULL;
 	struct list_head freed_data_list;
 	struct list_head *cut_pos = NULL;
-	int err;
+	bool wake;
 
 	INIT_LIST_HEAD(&freed_data_list);
 
@@ -3629,30 +3690,20 @@ void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid)
 				  cut_pos);
 	spin_unlock(&sbi->s_md_lock);
 
-	if (test_opt(sb, DISCARD)) {
-		list_for_each_entry(entry, &freed_data_list, efd_list) {
-			err = ext4_issue_discard(sb, entry->efd_group,
-						 entry->efd_start_cluster,
-						 entry->efd_count,
-						 &discard_bio);
-			if (err && err != -EOPNOTSUPP) {
-				ext4_msg(sb, KERN_WARNING, "discard request in"
-					 " group:%d block:%d count:%d failed"
-					 " with %d", entry->efd_group,
-					 entry->efd_start_cluster,
-					 entry->efd_count, err);
-			} else if (err == -EOPNOTSUPP)
-				break;
-		}
+	list_for_each_entry(entry, &freed_data_list, efd_list)
+		ext4_free_data_in_buddy(sb, entry);
 
-		if (discard_bio) {
-			submit_bio_wait(discard_bio);
-			bio_put(discard_bio);
-		}
+	if (test_opt(sb, DISCARD)) {
+		spin_lock(&sbi->s_md_lock);
+		wake = list_empty(&sbi->s_discard_list);
+		list_splice_tail(&freed_data_list, &sbi->s_discard_list);
+		spin_unlock(&sbi->s_md_lock);
+		if (wake)
+			queue_work(system_unbound_wq, &sbi->s_discard_work);
+	} else {
+		list_for_each_entry_safe(entry, tmp, &freed_data_list, efd_list)
+			kmem_cache_free(ext4_free_data_cachep, entry);
 	}
-
-	list_for_each_entry_safe(entry, tmp, &freed_data_list, efd_list)
-		ext4_free_data_in_buddy(sb, entry);
 }
 
 int __init ext4_init_mballoc(void)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH V4 5/5] ext4: flush background discard kwork when retry allocation
  2021-08-30  7:52 [PATCH V4 0/5] ext4: get discard out of jbd2 commit kthread Wang Jianchao
                   ` (3 preceding siblings ...)
  2021-08-30  7:52 ` [PATCH V4 4/5] ext4: get discard out of jbd2 commit kthread contex Wang Jianchao
@ 2021-08-30  7:52 ` Wang Jianchao
  4 siblings, 0 replies; 7+ messages in thread
From: Wang Jianchao @ 2021-08-30  7:52 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: jack, guoqing.jiang, linux-kernel, linux-ext4

From: Wang Jianchao <wangjianchao@kuaishou.com>

The background discard kwork tries to mark blocks used and issue
discard. This can make filesystem suffer from NOSPC error, xfstest
generic/371 can fail due to it. Fix it by flushing discard kwork
in ext4_should_retry_alloc. At the same time, give up discard at
the moment.

Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
---
 fs/ext4/balloc.c  | 8 +++++++-
 fs/ext4/ext4.h    | 1 +
 fs/ext4/mballoc.c | 7 +++++--
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 9dc6e74b265c..a0fb0c4bdc7c 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -652,8 +652,14 @@ int ext4_should_retry_alloc(struct super_block *sb, int *retries)
 	 * possible we just missed a transaction commit that did so
 	 */
 	smp_mb();
-	if (sbi->s_mb_free_pending == 0)
+	if (sbi->s_mb_free_pending == 0) {
+		if (test_opt(sb, DISCARD)) {
+			atomic_inc(&sbi->s_retry_alloc_pending);
+			flush_work(&sbi->s_discard_work);
+			atomic_dec(&sbi->s_retry_alloc_pending);
+		}
 		return ext4_has_free_clusters(sbi, 1, 0);
+	}
 
 	/*
 	 * it's possible we've just missed a transaction commit here,
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 6b678b968d84..d71dcac3b97f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1538,6 +1538,7 @@ struct ext4_sb_info {
 						   after commit completed */
 	struct list_head s_discard_list;
 	struct work_struct s_discard_work;
+	atomic_t s_retry_alloc_pending;
 	struct rb_root s_mb_avg_fragment_size_root;
 	rwlock_t s_mb_rb_lock;
 	struct list_head *s_mb_largest_free_orders;
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 259822fc0ae9..c0dea52a7124 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -3331,9 +3331,11 @@ static void ext4_discard_work(struct work_struct *work)
 	load_grp = UINT_MAX;
 	list_for_each_entry_safe(fd, nfd, &discard_list, efd_list) {
 		/*
-		 * If filesystem is umounting or no memory, give up the discard
+		 * If filesystem is umounting or no memory or suffering
+		 * from no space, give up the discard
 		 */
-		if ((sb->s_flags & SB_ACTIVE) && !err) {
+		if ((sb->s_flags & SB_ACTIVE) && !err &&
+		    !atomic_read(&sbi->s_retry_alloc_pending)) {
 			grp = fd->efd_group;
 			if (grp != load_grp) {
 				if (load_grp != UINT_MAX)
@@ -3431,6 +3433,7 @@ int ext4_mb_init(struct super_block *sb)
 	INIT_LIST_HEAD(&sbi->s_freed_data_list);
 	INIT_LIST_HEAD(&sbi->s_discard_list);
 	INIT_WORK(&sbi->s_discard_work, ext4_discard_work);
+	atomic_set(&sbi->s_retry_alloc_pending, 0);
 
 	sbi->s_mb_max_to_scan = MB_DEFAULT_MAX_TO_SCAN;
 	sbi->s_mb_min_to_scan = MB_DEFAULT_MIN_TO_SCAN;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V4 4/5] ext4: get discard out of jbd2 commit kthread contex
  2021-08-30  7:52 ` [PATCH V4 4/5] ext4: get discard out of jbd2 commit kthread contex Wang Jianchao
@ 2021-08-31  4:00   ` Theodore Ts'o
  0 siblings, 0 replies; 7+ messages in thread
From: Theodore Ts'o @ 2021-08-31  4:00 UTC (permalink / raw)
  To: Wang Jianchao
  Cc: adilger.kernel, jack, guoqing.jiang, linux-kernel, linux-ext4

On Mon, Aug 30, 2021 at 03:52:45PM +0800, Wang Jianchao wrote:
> From: Wang Jianchao <wangjianchao@kuaishou.com>
> 
> Right now, discard is issued and waited to be completed in jbd2
> commit kthread context after the logs are committed. When large
> amount of files are deleted and discard is flooding, jbd2 commit
> kthread can be blocked for long time. Then all of the metadata
> operations can be blocked to wait the log space.
> 
> One case is the page fault path with read mm->mmap_sem held, which
> wants to update the file time but has to wait for the log space.
> When other threads in the task wants to do mmap, then write mmap_sem
> is blocked. Finally all of the following read mmap_sem requirements
> are blocked, even the ps command which need to read the /proc/pid/
> -cmdline. Our monitor service which needs to read /proc/pid/cmdline
> used to be blocked for 5 mins.
> 
> This patch frees the blocks back to buddy after commit and then do
> discard in a async kworker context in fstrim fashion, namely,
>  - mark blocks to be discarded as used if they have not been allocated
>  - do discard
>  - mark them free
> After this, jbd2 commit kthread won't be blocked any more by discard
> and we won't get NOSPC even if the discard is slow or throttled.
> 
> Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2
> Suggested-by: Theodore Ts'o <tytso@mit.edu>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>

I had applied the V3 version of this patch series for testing
purposes, and then accidentally included in the dev branch.  So an
earlier version of this patch series has been in the ext4 git tree for
a while.  I've done a comparison between the V3 and V4 patches, and
aside from a minor whitespace change in patch #1, the only patch that
had any real changes was this one (#4).  Patch #5 was also added in
the V4 series.

So I've edited the ext4 git tree using rebase magic, replacing patch
#4 and adding patch #5.

In other words, this is a slightly more complicated, "Thanks,
applied".  :-)

					- Ted

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-08-31  4:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-30  7:52 [PATCH V4 0/5] ext4: get discard out of jbd2 commit kthread Wang Jianchao
2021-08-30  7:52 ` [PATCH V4 1/5] ext4: remove the 'group' parameter of ext4_trim_extent Wang Jianchao
2021-08-30  7:52 ` [PATCH V4 2/5] ext4: add new helper interface ext4_try_to_trim_range() Wang Jianchao
2021-08-30  7:52 ` [PATCH V4 3/5] ext4: remove the repeated comment of ext4_trim_all_free Wang Jianchao
2021-08-30  7:52 ` [PATCH V4 4/5] ext4: get discard out of jbd2 commit kthread contex Wang Jianchao
2021-08-31  4:00   ` Theodore Ts'o
2021-08-30  7:52 ` [PATCH V4 5/5] ext4: flush background discard kwork when retry allocation Wang Jianchao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).