LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] ext4: fix indirect punch hole corruption
@ 2015-02-05 20:50 Omar Sandoval
  2015-02-05 20:56 ` Omar Sandoval
  2015-02-05 21:30 ` Chris J Arges
  0 siblings, 2 replies; 15+ messages in thread
From: Omar Sandoval @ 2015-02-05 20:50 UTC (permalink / raw)
  To: Theodore Ts'o, Andreas Dilger, Lukáš Czerner,
	linux-ext4, linux-kernel
  Cc: Chris J Arges, Omar Sandoval

Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
mapping. However, the case where the punch happens within one level of
indirection is incorrect. It assumes that the partial branches returned
from ext4_find_shared will have the same depth, but this is not
necessarily the case even when the offsets have the same depth. For
example, if the last block occurs at the beginning of an indirect group
(i.e., it has an offset of 0 at the end of the offsets array), then
ext4_find_shared will return a shallower chain. So, let's handle the
mismatch and clean up that case. Tested with generic/270, which no
longer leads to an inconsistent filesystem like before.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
 fs/ext4/indirect.c | 61 +++++++++++++++++++++++++++++++-----------------------
 1 file changed, 35 insertions(+), 26 deletions(-)

diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 36b3696..e04bbce 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -1434,50 +1434,59 @@ end_range:
 	 * in punch_hole so we need to point to the next element
 	 */
 	partial2->p++;
-	while ((partial > chain) || (partial2 > chain2)) {
-		/* We're at the same block, so we're almost finished */
-		if ((partial->bh && partial2->bh) &&
-		    (partial->bh->b_blocknr == partial2->bh->b_blocknr)) {
-			if ((partial > chain) && (partial2 > chain2)) {
-				ext4_free_branches(handle, inode, partial->bh,
-						   partial->p + 1,
-						   partial2->p,
-						   (chain+n-1) - partial);
-				BUFFER_TRACE(partial->bh, "call brelse");
-				brelse(partial->bh);
-				BUFFER_TRACE(partial2->bh, "call brelse");
-				brelse(partial2->bh);
-			}
+	while (partial > chain && partial2 > chain2) {
+		int depth = (chain+n-1) - partial;
+		int depth2 = (chain2+n2-1) - partial2;
+
+		if (partial->bh->b_blocknr == partial2->bh->b_blocknr) {
+			/*
+			 * We've converged on the same block. Clear the range,
+			 * then we're done.
+			 */
+			ext4_free_branches(handle, inode, partial->bh,
+					   partial->p + 1,
+					   partial2->p,
+					   (chain+n-1) - partial);
+			BUFFER_TRACE(partial->bh, "call brelse");
+			brelse(partial->bh);
+			BUFFER_TRACE(partial2->bh, "call brelse");
+			brelse(partial2->bh);
 			return 0;
 		}
+
 		/*
-		 * Clear the ends of indirect blocks on the shared branch
-		 * at the start of the range
+		 * The start and end partial branches may not be at the same
+		 * level even though the punch happened within one level. So, we
+		 * give them a chance to arrive at the same level, then walk
+		 * them in step with each other until we converge on the same
+		 * block.
 		 */
-		if (partial > chain) {
+		if (depth <= depth2) {
 			ext4_free_branches(handle, inode, partial->bh,
-				   partial->p + 1,
-				   (__le32 *)partial->bh->b_data+addr_per_block,
-				   (chain+n-1) - partial);
+					   partial->p + 1,
+					   (__le32 *)partial->bh->b_data+addr_per_block,
+					   (chain+n-1) - partial);
 			BUFFER_TRACE(partial->bh, "call brelse");
 			brelse(partial->bh);
 			partial--;
 		}
-		/*
-		 * Clear the ends of indirect blocks on the shared branch
-		 * at the end of the range
-		 */
-		if (partial2 > chain2) {
+		if (depth2 <= depth) {
 			ext4_free_branches(handle, inode, partial2->bh,
 					   (__le32 *)partial2->bh->b_data,
 					   partial2->p,
-					   (chain2+n-1) - partial2);
+					   (chain2+n2-1) - partial2);
 			BUFFER_TRACE(partial2->bh, "call brelse");
 			brelse(partial2->bh);
 			partial2--;
 		}
 	}
 
+	/*
+	 * The punch happened within the same level, so at some point we _must_
+	 * converge on an indirect block.
+	 */
+	BUG_ON(1);
+
 do_indirects:
 	/* Kill the remaining (whole) subtrees */
 	switch (offsets[0]) {
-- 
2.2.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ext4: fix indirect punch hole corruption
  2015-02-05 20:50 [PATCH] ext4: fix indirect punch hole corruption Omar Sandoval
@ 2015-02-05 20:56 ` Omar Sandoval
  2015-02-05 21:30 ` Chris J Arges
  1 sibling, 0 replies; 15+ messages in thread
From: Omar Sandoval @ 2015-02-05 20:56 UTC (permalink / raw)
  To: Theodore Ts'o, Andreas Dilger, Lukáš Czerner,
	linux-ext4, linux-kernel
  Cc: Chris J Arges

[-- Attachment #1: Type: text/plain, Size: 1718 bytes --]

On Thu, Feb 05, 2015 at 12:50:13PM -0800, Omar Sandoval wrote:
> Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
> mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
> mapping. However, the case where the punch happens within one level of
> indirection is incorrect. It assumes that the partial branches returned
> from ext4_find_shared will have the same depth, but this is not
> necessarily the case even when the offsets have the same depth. For
> example, if the last block occurs at the beginning of an indirect group
> (i.e., it has an offset of 0 at the end of the offsets array), then
> ext4_find_shared will return a shallower chain. So, let's handle the
> mismatch and clean up that case. Tested with generic/270, which no
> longer leads to an inconsistent filesystem like before.
> 
Ah, forgot to mention that this applies to v3.19-rc7. Additionally,
I've attached a script which reproduces the issue. You can run
xfs_io -c fiemap "$FILE" (after an umount/mount of the filesystem just
be safe) and see that before the fix, a bunch of blocks get swallowed:

/mnt/scratch/test:
        0: [0..31]: 1026..1057
        1: [32..127]: 930..1025
        2: [128..255]: 2114..2241
        3: [256..511]: 2306..2561
        4: [512..535]: 2818..2841
        5: [536..1559]: hole

... but after the fix, the hole is in the right place:

/mnt/scratch/test:
        0: [0..31]: 1026..1057
        1: [32..127]: 930..1025
        2: [128..255]: 2114..2241
        3: [256..511]: 2306..2561
        4: [512..791]: 2818..3097
        5: [792..1047]: hole
        6: [1048..1559]: 4122..4633

(These examples are from a 1k-block ext3 filesystem with
CONFIG_EXT4_USE_FOR_EXT23=y).

-- 
Omar

[-- Attachment #2: ext4punch.sh --]
[-- Type: application/x-sh, Size: 606 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ext4: fix indirect punch hole corruption
  2015-02-05 20:50 [PATCH] ext4: fix indirect punch hole corruption Omar Sandoval
  2015-02-05 20:56 ` Omar Sandoval
@ 2015-02-05 21:30 ` Chris J Arges
  2015-02-05 21:41   ` Omar Sandoval
  2015-02-07  0:28   ` Omar Sandoval
  1 sibling, 2 replies; 15+ messages in thread
From: Chris J Arges @ 2015-02-05 21:30 UTC (permalink / raw)
  To: Omar Sandoval, Theodore Ts'o, Andreas Dilger,
	Lukáš Czerner, linux-ext4, linux-kernel

On 02/05/2015 02:50 PM, Omar Sandoval wrote:
> Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
> mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
> mapping. However, the case where the punch happens within one level of
> indirection is incorrect. It assumes that the partial branches returned
> from ext4_find_shared will have the same depth, but this is not
> necessarily the case even when the offsets have the same depth. For
> example, if the last block occurs at the beginning of an indirect group
> (i.e., it has an offset of 0 at the end of the offsets array), then
> ext4_find_shared will return a shallower chain. So, let's handle the
> mismatch and clean up that case. Tested with generic/270, which no
> longer leads to an inconsistent filesystem like before.
> 
> Signed-off-by: Omar Sandoval <osandov@osandov.com>


Omar,

Tried running this with my original reproducer (qcow2 snapshotting and
rebooting) and got the following:
------------[ cut here ]------------
kernel BUG at fs/ext4/indirect.c:1488!
invalid opcode: 0000 [#1] SMP
<snip>
CPU: 4 PID: 9771 Comm: qemu-img Tainted: G        W   E
3.19.0-rc7-b164aa5 #22
Hardware name: XXX
task: ffff880243a34aa0 ti: ffff880240f3c000 task.ti: ffff880240f3c000
RIP: 0010:[<ffffffff812a38e7>]  [<ffffffff812a38e7>]
ext4_ind_remove_space+0x737/0x740
RSP: 0018:ffff880240f3fc98  EFLAGS: 00010246
RAX: ffff880240f3fd98 RBX: ffff880240f3fd98 RCX: ffff880098c684dc
RDX: ffff880098c684d4 RSI: ffff880240f3fd08 RDI: ffff880098c684e0
RBP: ffff880240f3fdf8 R08: ffff880098c684e0 R09: 0000000000000000
R10: ffff880240f3faa0 R11: 0000000000000038 R12: ffff88009bb65810
R13: 0000000000000003 R14: ffff880240f3fd08 R15: ffff880240f3fd68
FS:  00007f7ad84ad700(0000) GS:ffff88024e500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7ae19a78ff CR3: 0000000241c52000 CR4: 00000000003427e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 ffffea0002596600 ffff88009bb65960 0000000000000050 0000000000000100
 ffff8802447a2900 0000000000000003 ffff880240f3fd08 ffff88009b96c030
 ffff880240f3fd08 000000000000024b 000000000000000d ffff880000000034
Call Trace:
 [<ffffffff8129ccdc>] ? ext4_discard_preallocations+0x16c/0x480
 [<ffffffff8126982f>] ext4_punch_hole+0x3bf/0x430
 [<ffffffff81293c9e>] ext4_fallocate+0x16e/0x8c0
 [<ffffffff811e4849>] ? __sb_start_write+0x49/0xf0
 [<ffffffff811df3cf>] vfs_fallocate+0x12f/0x250
 [<ffffffff810eda41>] ? SyS_futex+0x71/0x150
 [<ffffffff811e0408>] SyS_fallocate+0x48/0x80
 [<ffffffff8177cc2d>] system_call_fastpath+0x16/0x1b
Code: 40 4c 8d 0c c5 e8 ff ff ff 49 c1 f9 03 45 69 c9 ab aa aa aa e8 6b
e2 ff ff 48 8b 85 10 ff ff ff c7 00 00 00 00 00 e9 fd fa ff ff <0f> 0b
0f 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41
RIP  [<ffffffff812a38e7>] ext4_ind_remove_space+0x737/0x740
 RSP <ffff880240f3fc98>
---[ end trace 05f053fdd5d908a8 ]---

So this is hitting the BUG_ON you added.
--chris

> ---
>  fs/ext4/indirect.c | 61 +++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 35 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
> index 36b3696..e04bbce 100644
> --- a/fs/ext4/indirect.c
> +++ b/fs/ext4/indirect.c
> @@ -1434,50 +1434,59 @@ end_range:
>  	 * in punch_hole so we need to point to the next element
>  	 */
>  	partial2->p++;
> -	while ((partial > chain) || (partial2 > chain2)) {
> -		/* We're at the same block, so we're almost finished */
> -		if ((partial->bh && partial2->bh) &&
> -		    (partial->bh->b_blocknr == partial2->bh->b_blocknr)) {
> -			if ((partial > chain) && (partial2 > chain2)) {
> -				ext4_free_branches(handle, inode, partial->bh,
> -						   partial->p + 1,
> -						   partial2->p,
> -						   (chain+n-1) - partial);
> -				BUFFER_TRACE(partial->bh, "call brelse");
> -				brelse(partial->bh);
> -				BUFFER_TRACE(partial2->bh, "call brelse");
> -				brelse(partial2->bh);
> -			}
> +	while (partial > chain && partial2 > chain2) {
> +		int depth = (chain+n-1) - partial;
> +		int depth2 = (chain2+n2-1) - partial2;
> +
> +		if (partial->bh->b_blocknr == partial2->bh->b_blocknr) {
> +			/*
> +			 * We've converged on the same block. Clear the range,
> +			 * then we're done.
> +			 */
> +			ext4_free_branches(handle, inode, partial->bh,
> +					   partial->p + 1,
> +					   partial2->p,
> +					   (chain+n-1) - partial);
> +			BUFFER_TRACE(partial->bh, "call brelse");
> +			brelse(partial->bh);
> +			BUFFER_TRACE(partial2->bh, "call brelse");
> +			brelse(partial2->bh);
>  			return 0;
>  		}
> +
>  		/*
> -		 * Clear the ends of indirect blocks on the shared branch
> -		 * at the start of the range
> +		 * The start and end partial branches may not be at the same
> +		 * level even though the punch happened within one level. So, we
> +		 * give them a chance to arrive at the same level, then walk
> +		 * them in step with each other until we converge on the same
> +		 * block.
>  		 */
> -		if (partial > chain) {
> +		if (depth <= depth2) {
>  			ext4_free_branches(handle, inode, partial->bh,
> -				   partial->p + 1,
> -				   (__le32 *)partial->bh->b_data+addr_per_block,
> -				   (chain+n-1) - partial);
> +					   partial->p + 1,
> +					   (__le32 *)partial->bh->b_data+addr_per_block,
> +					   (chain+n-1) - partial);
>  			BUFFER_TRACE(partial->bh, "call brelse");
>  			brelse(partial->bh);
>  			partial--;
>  		}
> -		/*
> -		 * Clear the ends of indirect blocks on the shared branch
> -		 * at the end of the range
> -		 */
> -		if (partial2 > chain2) {
> +		if (depth2 <= depth) {
>  			ext4_free_branches(handle, inode, partial2->bh,
>  					   (__le32 *)partial2->bh->b_data,
>  					   partial2->p,
> -					   (chain2+n-1) - partial2);
> +					   (chain2+n2-1) - partial2);
>  			BUFFER_TRACE(partial2->bh, "call brelse");
>  			brelse(partial2->bh);
>  			partial2--;
>  		}
>  	}
>  
> +	/*
> +	 * The punch happened within the same level, so at some point we _must_
> +	 * converge on an indirect block.
> +	 */
> +	BUG_ON(1);
> +
>  do_indirects:
>  	/* Kill the remaining (whole) subtrees */
>  	switch (offsets[0]) {
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ext4: fix indirect punch hole corruption
  2015-02-05 21:30 ` Chris J Arges
@ 2015-02-05 21:41   ` Omar Sandoval
  2015-02-07  0:28   ` Omar Sandoval
  1 sibling, 0 replies; 15+ messages in thread
From: Omar Sandoval @ 2015-02-05 21:41 UTC (permalink / raw)
  To: Chris J Arges
  Cc: Theodore Ts'o, Andreas Dilger, Lukáš Czerner,
	linux-ext4, linux-kernel

On Thu, Feb 05, 2015 at 03:30:47PM -0600, Chris J Arges wrote:
> On 02/05/2015 02:50 PM, Omar Sandoval wrote:
> > Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
> > mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
> > mapping. However, the case where the punch happens within one level of
> > indirection is incorrect. It assumes that the partial branches returned
> > from ext4_find_shared will have the same depth, but this is not
> > necessarily the case even when the offsets have the same depth. For
> > example, if the last block occurs at the beginning of an indirect group
> > (i.e., it has an offset of 0 at the end of the offsets array), then
> > ext4_find_shared will return a shallower chain. So, let's handle the
> > mismatch and clean up that case. Tested with generic/270, which no
> > longer leads to an inconsistent filesystem like before.
> > 
> > Signed-off-by: Omar Sandoval <osandov@osandov.com>
> 
> 
> Omar,
> 
> Tried running this with my original reproducer (qcow2 snapshotting and
> rebooting) and got the following:
> ------------[ cut here ]------------
> kernel BUG at fs/ext4/indirect.c:1488!
> invalid opcode: 0000 [#1] SMP
> <snip>
> CPU: 4 PID: 9771 Comm: qemu-img Tainted: G        W   E
> 3.19.0-rc7-b164aa5 #22
> Hardware name: XXX
> task: ffff880243a34aa0 ti: ffff880240f3c000 task.ti: ffff880240f3c000
> RIP: 0010:[<ffffffff812a38e7>]  [<ffffffff812a38e7>]
> ext4_ind_remove_space+0x737/0x740
> RSP: 0018:ffff880240f3fc98  EFLAGS: 00010246
> RAX: ffff880240f3fd98 RBX: ffff880240f3fd98 RCX: ffff880098c684dc
> RDX: ffff880098c684d4 RSI: ffff880240f3fd08 RDI: ffff880098c684e0
> RBP: ffff880240f3fdf8 R08: ffff880098c684e0 R09: 0000000000000000
> R10: ffff880240f3faa0 R11: 0000000000000038 R12: ffff88009bb65810
> R13: 0000000000000003 R14: ffff880240f3fd08 R15: ffff880240f3fd68
> FS:  00007f7ad84ad700(0000) GS:ffff88024e500000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f7ae19a78ff CR3: 0000000241c52000 CR4: 00000000003427e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Stack:
>  ffffea0002596600 ffff88009bb65960 0000000000000050 0000000000000100
>  ffff8802447a2900 0000000000000003 ffff880240f3fd08 ffff88009b96c030
>  ffff880240f3fd08 000000000000024b 000000000000000d ffff880000000034
> Call Trace:
>  [<ffffffff8129ccdc>] ? ext4_discard_preallocations+0x16c/0x480
>  [<ffffffff8126982f>] ext4_punch_hole+0x3bf/0x430
>  [<ffffffff81293c9e>] ext4_fallocate+0x16e/0x8c0
>  [<ffffffff811e4849>] ? __sb_start_write+0x49/0xf0
>  [<ffffffff811df3cf>] vfs_fallocate+0x12f/0x250
>  [<ffffffff810eda41>] ? SyS_futex+0x71/0x150
>  [<ffffffff811e0408>] SyS_fallocate+0x48/0x80
>  [<ffffffff8177cc2d>] system_call_fastpath+0x16/0x1b
> Code: 40 4c 8d 0c c5 e8 ff ff ff 49 c1 f9 03 45 69 c9 ab aa aa aa e8 6b
> e2 ff ff 48 8b 85 10 ff ff ff c7 00 00 00 00 00 e9 fd fa ff ff <0f> 0b
> 0f 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41
> RIP  [<ffffffff812a38e7>] ext4_ind_remove_space+0x737/0x740
>  RSP <ffff880240f3fc98>
> ---[ end trace 05f053fdd5d908a8 ]---
> 
> So this is hitting the BUG_ON you added.
> --chris
> 
Ah, yes, thanks Chris, my logic there was wrong. For example, punching
from the beginning of the first doubly-indirected block to, say, the
100th doubly-indirected block, will hit this. Gimme a sec and I'll fix
it.

-- 
Omar

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ext4: fix indirect punch hole corruption
  2015-02-05 21:30 ` Chris J Arges
  2015-02-05 21:41   ` Omar Sandoval
@ 2015-02-07  0:28   ` Omar Sandoval
  2015-02-07  0:35     ` Chris J Arges
  1 sibling, 1 reply; 15+ messages in thread
From: Omar Sandoval @ 2015-02-07  0:28 UTC (permalink / raw)
  To: Chris J Arges
  Cc: Theodore Ts'o, Andreas Dilger, Lukáš Czerner,
	linux-ext4, linux-kernel

On Thu, Feb 05, 2015 at 03:30:47PM -0600, Chris J Arges wrote:
> On 02/05/2015 02:50 PM, Omar Sandoval wrote:
> > Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
> > mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
> > mapping. However, the case where the punch happens within one level of
> > indirection is incorrect. It assumes that the partial branches returned
> > from ext4_find_shared will have the same depth, but this is not
> > necessarily the case even when the offsets have the same depth. For
> > example, if the last block occurs at the beginning of an indirect group
> > (i.e., it has an offset of 0 at the end of the offsets array), then
> > ext4_find_shared will return a shallower chain. So, let's handle the
> > mismatch and clean up that case. Tested with generic/270, which no
> > longer leads to an inconsistent filesystem like before.
> > 
> > Signed-off-by: Omar Sandoval <osandov@osandov.com>
> 
> 
> Omar,
> 
> Tried running this with my original reproducer (qcow2 snapshotting and
> rebooting) and got the following:
> ------------[ cut here ]------------
> kernel BUG at fs/ext4/indirect.c:1488!
> invalid opcode: 0000 [#1] SMP
> <snip>
> CPU: 4 PID: 9771 Comm: qemu-img Tainted: G        W   E
> 3.19.0-rc7-b164aa5 #22
> Hardware name: XXX
> task: ffff880243a34aa0 ti: ffff880240f3c000 task.ti: ffff880240f3c000
> RIP: 0010:[<ffffffff812a38e7>]  [<ffffffff812a38e7>]
> ext4_ind_remove_space+0x737/0x740
> RSP: 0018:ffff880240f3fc98  EFLAGS: 00010246
> RAX: ffff880240f3fd98 RBX: ffff880240f3fd98 RCX: ffff880098c684dc
> RDX: ffff880098c684d4 RSI: ffff880240f3fd08 RDI: ffff880098c684e0
> RBP: ffff880240f3fdf8 R08: ffff880098c684e0 R09: 0000000000000000
> R10: ffff880240f3faa0 R11: 0000000000000038 R12: ffff88009bb65810
> R13: 0000000000000003 R14: ffff880240f3fd08 R15: ffff880240f3fd68
> FS:  00007f7ad84ad700(0000) GS:ffff88024e500000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f7ae19a78ff CR3: 0000000241c52000 CR4: 00000000003427e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Stack:
>  ffffea0002596600 ffff88009bb65960 0000000000000050 0000000000000100
>  ffff8802447a2900 0000000000000003 ffff880240f3fd08 ffff88009b96c030
>  ffff880240f3fd08 000000000000024b 000000000000000d ffff880000000034
> Call Trace:
>  [<ffffffff8129ccdc>] ? ext4_discard_preallocations+0x16c/0x480
>  [<ffffffff8126982f>] ext4_punch_hole+0x3bf/0x430
>  [<ffffffff81293c9e>] ext4_fallocate+0x16e/0x8c0
>  [<ffffffff811e4849>] ? __sb_start_write+0x49/0xf0
>  [<ffffffff811df3cf>] vfs_fallocate+0x12f/0x250
>  [<ffffffff810eda41>] ? SyS_futex+0x71/0x150
>  [<ffffffff811e0408>] SyS_fallocate+0x48/0x80
>  [<ffffffff8177cc2d>] system_call_fastpath+0x16/0x1b
> Code: 40 4c 8d 0c c5 e8 ff ff ff 49 c1 f9 03 45 69 c9 ab aa aa aa e8 6b
> e2 ff ff 48 8b 85 10 ff ff ff c7 00 00 00 00 00 e9 fd fa ff ff <0f> 0b
> 0f 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41
> RIP  [<ffffffff812a38e7>] ext4_ind_remove_space+0x737/0x740
>  RSP <ffff880240f3fc98>
> ---[ end trace 05f053fdd5d908a8 ]---
> 
> So this is hitting the BUG_ON you added.
> --chris
> 

Chris,

Would you mind sending over your exact reproduction steps? I've cooked
up a patch which appears to be working, but I want to test it out a bit
more before I send it out.

Thanks,
-- 
Omar

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ext4: fix indirect punch hole corruption
  2015-02-07  0:28   ` Omar Sandoval
@ 2015-02-07  0:35     ` Chris J Arges
  2015-02-07 21:57       ` [PATCH v2] " Omar Sandoval
  0 siblings, 1 reply; 15+ messages in thread
From: Chris J Arges @ 2015-02-07  0:35 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: Theodore Ts'o, Andreas Dilger, Lukáš Czerner,
	linux-ext4, linux-kernel

On 02/06/2015 06:28 PM, Omar Sandoval wrote:
> On Thu, Feb 05, 2015 at 03:30:47PM -0600, Chris J Arges wrote:
>> On 02/05/2015 02:50 PM, Omar Sandoval wrote:
>>> Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
>>> mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
>>> mapping. However, the case where the punch happens within one level of
>>> indirection is incorrect. It assumes that the partial branches returned
>>> from ext4_find_shared will have the same depth, but this is not
>>> necessarily the case even when the offsets have the same depth. For
>>> example, if the last block occurs at the beginning of an indirect group
>>> (i.e., it has an offset of 0 at the end of the offsets array), then
>>> ext4_find_shared will return a shallower chain. So, let's handle the
>>> mismatch and clean up that case. Tested with generic/270, which no
>>> longer leads to an inconsistent filesystem like before.
>>>
>>> Signed-off-by: Omar Sandoval <osandov@osandov.com>
>>
>>
>> Omar,
>>
>> Tried running this with my original reproducer (qcow2 snapshotting and
>> rebooting) and got the following:
>> ------------[ cut here ]------------
>> kernel BUG at fs/ext4/indirect.c:1488!
>> invalid opcode: 0000 [#1] SMP
>> <snip>
>> CPU: 4 PID: 9771 Comm: qemu-img Tainted: G        W   E
>> 3.19.0-rc7-b164aa5 #22
>> Hardware name: XXX
>> task: ffff880243a34aa0 ti: ffff880240f3c000 task.ti: ffff880240f3c000
>> RIP: 0010:[<ffffffff812a38e7>]  [<ffffffff812a38e7>]
>> ext4_ind_remove_space+0x737/0x740
>> RSP: 0018:ffff880240f3fc98  EFLAGS: 00010246
>> RAX: ffff880240f3fd98 RBX: ffff880240f3fd98 RCX: ffff880098c684dc
>> RDX: ffff880098c684d4 RSI: ffff880240f3fd08 RDI: ffff880098c684e0
>> RBP: ffff880240f3fdf8 R08: ffff880098c684e0 R09: 0000000000000000
>> R10: ffff880240f3faa0 R11: 0000000000000038 R12: ffff88009bb65810
>> R13: 0000000000000003 R14: ffff880240f3fd08 R15: ffff880240f3fd68
>> FS:  00007f7ad84ad700(0000) GS:ffff88024e500000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f7ae19a78ff CR3: 0000000241c52000 CR4: 00000000003427e0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Stack:
>>  ffffea0002596600 ffff88009bb65960 0000000000000050 0000000000000100
>>  ffff8802447a2900 0000000000000003 ffff880240f3fd08 ffff88009b96c030
>>  ffff880240f3fd08 000000000000024b 000000000000000d ffff880000000034
>> Call Trace:
>>  [<ffffffff8129ccdc>] ? ext4_discard_preallocations+0x16c/0x480
>>  [<ffffffff8126982f>] ext4_punch_hole+0x3bf/0x430
>>  [<ffffffff81293c9e>] ext4_fallocate+0x16e/0x8c0
>>  [<ffffffff811e4849>] ? __sb_start_write+0x49/0xf0
>>  [<ffffffff811df3cf>] vfs_fallocate+0x12f/0x250
>>  [<ffffffff810eda41>] ? SyS_futex+0x71/0x150
>>  [<ffffffff811e0408>] SyS_fallocate+0x48/0x80
>>  [<ffffffff8177cc2d>] system_call_fastpath+0x16/0x1b
>> Code: 40 4c 8d 0c c5 e8 ff ff ff 49 c1 f9 03 45 69 c9 ab aa aa aa e8 6b
>> e2 ff ff 48 8b 85 10 ff ff ff c7 00 00 00 00 00 e9 fd fa ff ff <0f> 0b
>> 0f 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41
>> RIP  [<ffffffff812a38e7>] ext4_ind_remove_space+0x737/0x740
>>  RSP <ffff880240f3fc98>
>> ---[ end trace 05f053fdd5d908a8 ]---
>>
>> So this is hitting the BUG_ON you added.
>> --chris
>>
> 
> Chris,
> 
> Would you mind sending over your exact reproduction steps? I've cooked
> up a patch which appears to be working, but I want to test it out a bit
> more before I send it out.
> 
> Thanks,
> 

Omar,

To reproduce this issue do the following:
1) Setup ext4 ^extents, or ext3 filesystem with CONFIG_EXT4_USE_FOR_EXT23=y
2) Create and install a VM using a qcow2 image and store the file on the
filesystem
3) Snapshot the image with qemu-img
4) Boot the image and do some disk operations (fio,etc)
5) Shutdown image and delete snapshot
6) Repeat 3-5 until VM no longer boots due to image corruption,
generally this takes a few iterations depending on disk operations.

I have the reproducer ready to go on my system to feel free to send me a
preliminary patch and I'll test it again over the weekend.
Thanks,
--chris j arges

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] ext4: fix indirect punch hole corruption
  2015-02-07  0:35     ` Chris J Arges
@ 2015-02-07 21:57       ` Omar Sandoval
  2015-02-08 12:15         ` [PATCH v3] " Omar Sandoval
  0 siblings, 1 reply; 15+ messages in thread
From: Omar Sandoval @ 2015-02-07 21:57 UTC (permalink / raw)
  To: Theodore Ts'o, Andreas Dilger, Lukáš Czerner,
	Chris J Arges
  Cc: linux-ext4, linux-kernel, Omar Sandoval

Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
mapping. However, the case where the punch happens within one level of
indirection is incorrect. It assumes that the partial branches returned
from ext4_find_shared will have the same depth, but this is not
necessarily the case even when the offsets have the same depth. For
example, if the last block occurs at the beginning of an indirect group
(i.e., it has an offset of 0 at the end of the offsets array), then
ext4_find_shared will return a shallower chain. So, let's handle the
mismatch and clean up that case.

This should reflect the original intent of the code. However, I'm still
seeing failures of generic/270 due to an inconsistent filesystem after
many runs, so there could be another bug here, or it could be unrelated.
I'll keep working on tracking that down.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
Once again, this applies to v3.19-rc7. Chris, could you take this one
for a spin when you get the chance? Hopefully you'll be able to confirm
whether the corruption is still here or unrelated.

Changes from v1:
Handle partial == chain || partial2 == chain2 cases.

 fs/ext4/indirect.c | 57 +++++++++++++++++++++++++++++-------------------------
 1 file changed, 31 insertions(+), 26 deletions(-)

diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 36b3696..3ff325f 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -1434,49 +1434,54 @@ end_range:
 	 * in punch_hole so we need to point to the next element
 	 */
 	partial2->p++;
-	while ((partial > chain) || (partial2 > chain2)) {
-		/* We're at the same block, so we're almost finished */
-		if ((partial->bh && partial2->bh) &&
-		    (partial->bh->b_blocknr == partial2->bh->b_blocknr)) {
-			if ((partial > chain) && (partial2 > chain2)) {
-				ext4_free_branches(handle, inode, partial->bh,
-						   partial->p + 1,
-						   partial2->p,
-						   (chain+n-1) - partial);
-				BUFFER_TRACE(partial->bh, "call brelse");
-				brelse(partial->bh);
-				BUFFER_TRACE(partial2->bh, "call brelse");
-				brelse(partial2->bh);
-			}
+	while (partial > chain || partial2 > chain2) {
+		int depth = (chain+n-1) - partial;
+		int depth2 = (chain2+n2-1) - partial2;
+
+		if (partial > chain && partial2 > chain2 &&
+		    partial->bh->b_blocknr == partial2->bh->b_blocknr) {
+			/*
+			 * We've converged on the same block. Clear the range,
+			 * then we're done.
+			 */
+			ext4_free_branches(handle, inode, partial->bh,
+					   partial->p + 1,
+					   partial2->p,
+					   (chain+n-1) - partial);
+			BUFFER_TRACE(partial->bh, "call brelse");
+			brelse(partial->bh);
+			BUFFER_TRACE(partial2->bh, "call brelse");
+			brelse(partial2->bh);
 			return 0;
 		}
+
 		/*
-		 * Clear the ends of indirect blocks on the shared branch
-		 * at the start of the range
+		 * The start and end partial branches may not be at the same
+		 * level even though the punch happened within one level. So, we
+		 * give them a chance to arrive at the same level, then walk
+		 * them in step with each other until we converge on the same
+		 * block.
 		 */
-		if (partial > chain) {
+		if (partial > chain && depth <= depth2) {
 			ext4_free_branches(handle, inode, partial->bh,
-				   partial->p + 1,
-				   (__le32 *)partial->bh->b_data+addr_per_block,
-				   (chain+n-1) - partial);
+					   partial->p + 1,
+					   (__le32 *)partial->bh->b_data+addr_per_block,
+					   (chain+n-1) - partial);
 			BUFFER_TRACE(partial->bh, "call brelse");
 			brelse(partial->bh);
 			partial--;
 		}
-		/*
-		 * Clear the ends of indirect blocks on the shared branch
-		 * at the end of the range
-		 */
-		if (partial2 > chain2) {
+		if (partial2 > chain2 && depth2 <= depth) {
 			ext4_free_branches(handle, inode, partial2->bh,
 					   (__le32 *)partial2->bh->b_data,
 					   partial2->p,
-					   (chain2+n-1) - partial2);
+					   (chain2+n2-1) - partial2);
 			BUFFER_TRACE(partial2->bh, "call brelse");
 			brelse(partial2->bh);
 			partial2--;
 		}
 	}
+	return 0;
 
 do_indirects:
 	/* Kill the remaining (whole) subtrees */
-- 
2.3.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3] ext4: fix indirect punch hole corruption
  2015-02-07 21:57       ` [PATCH v2] " Omar Sandoval
@ 2015-02-08 12:15         ` Omar Sandoval
  2015-02-09 18:21           ` Chris J Arges
  0 siblings, 1 reply; 15+ messages in thread
From: Omar Sandoval @ 2015-02-08 12:15 UTC (permalink / raw)
  To: Theodore Ts'o, Andreas Dilger, Lukáš Czerner,
	Chris J Arges
  Cc: linux-ext4, linux-kernel, Omar Sandoval

Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
mapping. However, there are a bugs in a few cases.

In the case where the punch happens within one level of indirection, we
expect the start and end shared branches to converge on an indirect
block. However, because the branches returned from ext4_find_shared do
not necessarily start at the same level (e.g., the partial2 chain will
be shallower if the last block occurs at the beginning of an indirect
group), the walk of the two chains can end up "missing" each other and
freeing a bunch of extra blocks in the process. This mismatch can be
handled by first making sure that the chains are at the same level, then
walking them together until they converge.

In the case that a punch spans different levels of indirection, the
original code skips freeing the intermediate indirect trees if the last
block is the first triply-indirected block because it returns instead of
jumping to do_indirects. Additionally, a non-zero nr2 does not mean that
there's nothing else to free at the level of partial2: consider the case
where the all_zeroes in ext4_find_shared backed up the shared branch.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
Here's a couple more fixes folded in. Still applies to v3.19-rc7.

Changes from v2:
Handle skipped do_indirects when n < 4, n2 == 4, and partial2 == chain2
and skipped ext4_free_branches when nr2 != 0

Changes from v1:
Handle partial == chain || partial2 == chain2 cases.
 fs/ext4/indirect.c | 62 ++++++++++++++++++++++++++++--------------------------
 1 file changed, 32 insertions(+), 30 deletions(-)

diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 36b3696..279d9ba 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -1393,10 +1393,7 @@ end_range:
 				 * to free. Everything was covered by the start
 				 * of the range.
 				 */
-				return 0;
-			} else {
-				/* Shared branch grows from an indirect block */
-				partial2--;
+				goto do_indirects;
 			}
 		} else {
 			/*
@@ -1434,49 +1431,54 @@ end_range:
 	 * in punch_hole so we need to point to the next element
 	 */
 	partial2->p++;
-	while ((partial > chain) || (partial2 > chain2)) {
-		/* We're at the same block, so we're almost finished */
-		if ((partial->bh && partial2->bh) &&
-		    (partial->bh->b_blocknr == partial2->bh->b_blocknr)) {
-			if ((partial > chain) && (partial2 > chain2)) {
-				ext4_free_branches(handle, inode, partial->bh,
-						   partial->p + 1,
-						   partial2->p,
-						   (chain+n-1) - partial);
-				BUFFER_TRACE(partial->bh, "call brelse");
-				brelse(partial->bh);
-				BUFFER_TRACE(partial2->bh, "call brelse");
-				brelse(partial2->bh);
-			}
+	while (partial > chain || partial2 > chain2) {
+		int depth = (chain+n-1) - partial;
+		int depth2 = (chain2+n2-1) - partial2;
+
+		if (partial > chain && partial2 > chain2 &&
+		    partial->bh->b_blocknr == partial2->bh->b_blocknr) {
+			/*
+			 * We've converged on the same block. Clear the range,
+			 * then we're done.
+			 */
+			ext4_free_branches(handle, inode, partial->bh,
+					   partial->p + 1,
+					   partial2->p,
+					   (chain+n-1) - partial);
+			BUFFER_TRACE(partial->bh, "call brelse");
+			brelse(partial->bh);
+			BUFFER_TRACE(partial2->bh, "call brelse");
+			brelse(partial2->bh);
 			return 0;
 		}
+
 		/*
-		 * Clear the ends of indirect blocks on the shared branch
-		 * at the start of the range
+		 * The start and end partial branches may not be at the same
+		 * level even though the punch happened within one level. So, we
+		 * give them a chance to arrive at the same level, then walk
+		 * them in step with each other until we converge on the same
+		 * block.
 		 */
-		if (partial > chain) {
+		if (partial > chain && depth <= depth2) {
 			ext4_free_branches(handle, inode, partial->bh,
-				   partial->p + 1,
-				   (__le32 *)partial->bh->b_data+addr_per_block,
-				   (chain+n-1) - partial);
+					   partial->p + 1,
+					   (__le32 *)partial->bh->b_data+addr_per_block,
+					   (chain+n-1) - partial);
 			BUFFER_TRACE(partial->bh, "call brelse");
 			brelse(partial->bh);
 			partial--;
 		}
-		/*
-		 * Clear the ends of indirect blocks on the shared branch
-		 * at the end of the range
-		 */
-		if (partial2 > chain2) {
+		if (partial2 > chain2 && depth2 <= depth) {
 			ext4_free_branches(handle, inode, partial2->bh,
 					   (__le32 *)partial2->bh->b_data,
 					   partial2->p,
-					   (chain2+n-1) - partial2);
+					   (chain2+n2-1) - partial2);
 			BUFFER_TRACE(partial2->bh, "call brelse");
 			brelse(partial2->bh);
 			partial2--;
 		}
 	}
+	return 0;
 
 do_indirects:
 	/* Kill the remaining (whole) subtrees */
-- 
2.3.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] ext4: fix indirect punch hole corruption
  2015-02-08 12:15         ` [PATCH v3] " Omar Sandoval
@ 2015-02-09 18:21           ` Chris J Arges
  2015-02-09 21:03             ` Chris J Arges
  0 siblings, 1 reply; 15+ messages in thread
From: Chris J Arges @ 2015-02-09 18:21 UTC (permalink / raw)
  To: Omar Sandoval, Theodore Ts'o, Andreas Dilger,
	Lukáš Czerner
  Cc: linux-ext4, linux-kernel

On 02/08/2015 06:15 AM, Omar Sandoval wrote:
> Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
> mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
> mapping. However, there are a bugs in a few cases.
> 
> In the case where the punch happens within one level of indirection, we
> expect the start and end shared branches to converge on an indirect
> block. However, because the branches returned from ext4_find_shared do
> not necessarily start at the same level (e.g., the partial2 chain will
> be shallower if the last block occurs at the beginning of an indirect
> group), the walk of the two chains can end up "missing" each other and
> freeing a bunch of extra blocks in the process. This mismatch can be
> handled by first making sure that the chains are at the same level, then
> walking them together until they converge.
> 
> In the case that a punch spans different levels of indirection, the
> original code skips freeing the intermediate indirect trees if the last
> block is the first triply-indirected block because it returns instead of
> jumping to do_indirects. Additionally, a non-zero nr2 does not mean that
> there's nothing else to free at the level of partial2: consider the case
> where the all_zeroes in ext4_find_shared backed up the shared branch.
> 
> Signed-off-by: Omar Sandoval <osandov@osandov.com>

Omar,
With this patch I no longer seem to be getting the original corruption I
detected with my test case; however eventually I do get errors when
trying to delete qcow2 snapshots. After getting these errors if I run
'qemu-img check <image>' I see the following errors:

ERROR OFLAG_COPIED data cluster: l2_entry=800000018f7f0000 refcount=0
ERROR OFLAG_COPIED data cluster: l2_entry=800000018f800000 refcount=0
ERROR OFLAG_COPIED data cluster: l2_entry=800000018f810000 refcount=0

16941 errors were found on the image.
Data may be corrupted, or further writes to the image may corrupt it.

60459 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.
88629/262144 = 33.81% allocated, 9.57% fragmented, 0.00% compressed clusters
Image end offset: 10438180864

So this patch seems to have moved the problem. I can collect additional
logs if necessary.

Thanks,
--chris j arges

> ---
> Here's a couple more fixes folded in. Still applies to v3.19-rc7.
> 
> Changes from v2:
> Handle skipped do_indirects when n < 4, n2 == 4, and partial2 == chain2
> and skipped ext4_free_branches when nr2 != 0
> 
> Changes from v1:
> Handle partial == chain || partial2 == chain2 cases.
>  fs/ext4/indirect.c | 62 ++++++++++++++++++++++++++++--------------------------
>  1 file changed, 32 insertions(+), 30 deletions(-)
> 
> diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
> index 36b3696..279d9ba 100644
> --- a/fs/ext4/indirect.c
> +++ b/fs/ext4/indirect.c
> @@ -1393,10 +1393,7 @@ end_range:
>  				 * to free. Everything was covered by the start
>  				 * of the range.
>  				 */
> -				return 0;
> -			} else {
> -				/* Shared branch grows from an indirect block */
> -				partial2--;
> +				goto do_indirects;
>  			}
>  		} else {
>  			/*
> @@ -1434,49 +1431,54 @@ end_range:
>  	 * in punch_hole so we need to point to the next element
>  	 */
>  	partial2->p++;
> -	while ((partial > chain) || (partial2 > chain2)) {
> -		/* We're at the same block, so we're almost finished */
> -		if ((partial->bh && partial2->bh) &&
> -		    (partial->bh->b_blocknr == partial2->bh->b_blocknr)) {
> -			if ((partial > chain) && (partial2 > chain2)) {
> -				ext4_free_branches(handle, inode, partial->bh,
> -						   partial->p + 1,
> -						   partial2->p,
> -						   (chain+n-1) - partial);
> -				BUFFER_TRACE(partial->bh, "call brelse");
> -				brelse(partial->bh);
> -				BUFFER_TRACE(partial2->bh, "call brelse");
> -				brelse(partial2->bh);
> -			}
> +	while (partial > chain || partial2 > chain2) {
> +		int depth = (chain+n-1) - partial;
> +		int depth2 = (chain2+n2-1) - partial2;
> +
> +		if (partial > chain && partial2 > chain2 &&
> +		    partial->bh->b_blocknr == partial2->bh->b_blocknr) {
> +			/*
> +			 * We've converged on the same block. Clear the range,
> +			 * then we're done.
> +			 */
> +			ext4_free_branches(handle, inode, partial->bh,
> +					   partial->p + 1,
> +					   partial2->p,
> +					   (chain+n-1) - partial);
> +			BUFFER_TRACE(partial->bh, "call brelse");
> +			brelse(partial->bh);
> +			BUFFER_TRACE(partial2->bh, "call brelse");
> +			brelse(partial2->bh);
>  			return 0;
>  		}
> +
>  		/*
> -		 * Clear the ends of indirect blocks on the shared branch
> -		 * at the start of the range
> +		 * The start and end partial branches may not be at the same
> +		 * level even though the punch happened within one level. So, we
> +		 * give them a chance to arrive at the same level, then walk
> +		 * them in step with each other until we converge on the same
> +		 * block.
>  		 */
> -		if (partial > chain) {
> +		if (partial > chain && depth <= depth2) {
>  			ext4_free_branches(handle, inode, partial->bh,
> -				   partial->p + 1,
> -				   (__le32 *)partial->bh->b_data+addr_per_block,
> -				   (chain+n-1) - partial);
> +					   partial->p + 1,
> +					   (__le32 *)partial->bh->b_data+addr_per_block,
> +					   (chain+n-1) - partial);
>  			BUFFER_TRACE(partial->bh, "call brelse");
>  			brelse(partial->bh);
>  			partial--;
>  		}
> -		/*
> -		 * Clear the ends of indirect blocks on the shared branch
> -		 * at the end of the range
> -		 */
> -		if (partial2 > chain2) {
> +		if (partial2 > chain2 && depth2 <= depth) {
>  			ext4_free_branches(handle, inode, partial2->bh,
>  					   (__le32 *)partial2->bh->b_data,
>  					   partial2->p,
> -					   (chain2+n-1) - partial2);
> +					   (chain2+n2-1) - partial2);
>  			BUFFER_TRACE(partial2->bh, "call brelse");
>  			brelse(partial2->bh);
>  			partial2--;
>  		}
>  	}
> +	return 0;
>  
>  do_indirects:
>  	/* Kill the remaining (whole) subtrees */
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] ext4: fix indirect punch hole corruption
  2015-02-09 18:21           ` Chris J Arges
@ 2015-02-09 21:03             ` Chris J Arges
  2015-02-09 21:28               ` Omar Sandoval
  0 siblings, 1 reply; 15+ messages in thread
From: Chris J Arges @ 2015-02-09 21:03 UTC (permalink / raw)
  To: Omar Sandoval, Theodore Ts'o, Andreas Dilger,
	Lukáš Czerner
  Cc: linux-ext4, linux-kernel

On 02/09/2015 12:21 PM, Chris J Arges wrote:
> On 02/08/2015 06:15 AM, Omar Sandoval wrote:
>> Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
>> mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
>> mapping. However, there are a bugs in a few cases.
>>
>> In the case where the punch happens within one level of indirection, we
>> expect the start and end shared branches to converge on an indirect
>> block. However, because the branches returned from ext4_find_shared do
>> not necessarily start at the same level (e.g., the partial2 chain will
>> be shallower if the last block occurs at the beginning of an indirect
>> group), the walk of the two chains can end up "missing" each other and
>> freeing a bunch of extra blocks in the process. This mismatch can be
>> handled by first making sure that the chains are at the same level, then
>> walking them together until they converge.
>>
>> In the case that a punch spans different levels of indirection, the
>> original code skips freeing the intermediate indirect trees if the last
>> block is the first triply-indirected block because it returns instead of
>> jumping to do_indirects. Additionally, a non-zero nr2 does not mean that
>> there's nothing else to free at the level of partial2: consider the case
>> where the all_zeroes in ext4_find_shared backed up the shared branch.
>>
>> Signed-off-by: Omar Sandoval <osandov@osandov.com>
> 
> Omar,
> With this patch I no longer seem to be getting the original corruption I
> detected with my test case; however eventually I do get errors when
> trying to delete qcow2 snapshots. After getting these errors if I run
> 'qemu-img check <image>' I see the following errors:
> 
> ERROR OFLAG_COPIED data cluster: l2_entry=800000018f7f0000 refcount=0
> ERROR OFLAG_COPIED data cluster: l2_entry=800000018f800000 refcount=0
> ERROR OFLAG_COPIED data cluster: l2_entry=800000018f810000 refcount=0
> 
> 16941 errors were found on the image.
> Data may be corrupted, or further writes to the image may corrupt it.
> 
> 60459 leaked clusters were found on the image.
> This means waste of disk space, but no harm to data.
> 88629/262144 = 33.81% allocated, 9.57% fragmented, 0.00% compressed clusters
> Image end offset: 10438180864
> 
> So this patch seems to have moved the problem. I can collect additional
> logs if necessary.
> 
> Thanks,
> --chris j arges
> 

After ignoring snapshot deletion errors, I've hit the original
corruption problem with your patch still. I'll continue debugging this.
--chris j arges

>> ---
>> Here's a couple more fixes folded in. Still applies to v3.19-rc7.
>>
>> Changes from v2:
>> Handle skipped do_indirects when n < 4, n2 == 4, and partial2 == chain2
>> and skipped ext4_free_branches when nr2 != 0
>>
>> Changes from v1:
>> Handle partial == chain || partial2 == chain2 cases.
>>  fs/ext4/indirect.c | 62 ++++++++++++++++++++++++++++--------------------------
>>  1 file changed, 32 insertions(+), 30 deletions(-)
>>
>> diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
>> index 36b3696..279d9ba 100644
>> --- a/fs/ext4/indirect.c
>> +++ b/fs/ext4/indirect.c
>> @@ -1393,10 +1393,7 @@ end_range:
>>  				 * to free. Everything was covered by the start
>>  				 * of the range.
>>  				 */
>> -				return 0;
>> -			} else {
>> -				/* Shared branch grows from an indirect block */
>> -				partial2--;
>> +				goto do_indirects;
>>  			}
>>  		} else {
>>  			/*
>> @@ -1434,49 +1431,54 @@ end_range:
>>  	 * in punch_hole so we need to point to the next element
>>  	 */
>>  	partial2->p++;
>> -	while ((partial > chain) || (partial2 > chain2)) {
>> -		/* We're at the same block, so we're almost finished */
>> -		if ((partial->bh && partial2->bh) &&
>> -		    (partial->bh->b_blocknr == partial2->bh->b_blocknr)) {
>> -			if ((partial > chain) && (partial2 > chain2)) {
>> -				ext4_free_branches(handle, inode, partial->bh,
>> -						   partial->p + 1,
>> -						   partial2->p,
>> -						   (chain+n-1) - partial);
>> -				BUFFER_TRACE(partial->bh, "call brelse");
>> -				brelse(partial->bh);
>> -				BUFFER_TRACE(partial2->bh, "call brelse");
>> -				brelse(partial2->bh);
>> -			}
>> +	while (partial > chain || partial2 > chain2) {
>> +		int depth = (chain+n-1) - partial;
>> +		int depth2 = (chain2+n2-1) - partial2;
>> +
>> +		if (partial > chain && partial2 > chain2 &&
>> +		    partial->bh->b_blocknr == partial2->bh->b_blocknr) {
>> +			/*
>> +			 * We've converged on the same block. Clear the range,
>> +			 * then we're done.
>> +			 */
>> +			ext4_free_branches(handle, inode, partial->bh,
>> +					   partial->p + 1,
>> +					   partial2->p,
>> +					   (chain+n-1) - partial);
>> +			BUFFER_TRACE(partial->bh, "call brelse");
>> +			brelse(partial->bh);
>> +			BUFFER_TRACE(partial2->bh, "call brelse");
>> +			brelse(partial2->bh);
>>  			return 0;
>>  		}
>> +
>>  		/*
>> -		 * Clear the ends of indirect blocks on the shared branch
>> -		 * at the start of the range
>> +		 * The start and end partial branches may not be at the same
>> +		 * level even though the punch happened within one level. So, we
>> +		 * give them a chance to arrive at the same level, then walk
>> +		 * them in step with each other until we converge on the same
>> +		 * block.
>>  		 */
>> -		if (partial > chain) {
>> +		if (partial > chain && depth <= depth2) {
>>  			ext4_free_branches(handle, inode, partial->bh,
>> -				   partial->p + 1,
>> -				   (__le32 *)partial->bh->b_data+addr_per_block,
>> -				   (chain+n-1) - partial);
>> +					   partial->p + 1,
>> +					   (__le32 *)partial->bh->b_data+addr_per_block,
>> +					   (chain+n-1) - partial);
>>  			BUFFER_TRACE(partial->bh, "call brelse");
>>  			brelse(partial->bh);
>>  			partial--;
>>  		}
>> -		/*
>> -		 * Clear the ends of indirect blocks on the shared branch
>> -		 * at the end of the range
>> -		 */
>> -		if (partial2 > chain2) {
>> +		if (partial2 > chain2 && depth2 <= depth) {
>>  			ext4_free_branches(handle, inode, partial2->bh,
>>  					   (__le32 *)partial2->bh->b_data,
>>  					   partial2->p,
>> -					   (chain2+n-1) - partial2);
>> +					   (chain2+n2-1) - partial2);
>>  			BUFFER_TRACE(partial2->bh, "call brelse");
>>  			brelse(partial2->bh);
>>  			partial2--;
>>  		}
>>  	}
>> +	return 0;
>>  
>>  do_indirects:
>>  	/* Kill the remaining (whole) subtrees */
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] ext4: fix indirect punch hole corruption
  2015-02-09 21:03             ` Chris J Arges
@ 2015-02-09 21:28               ` Omar Sandoval
  2015-02-10 21:44                 ` [PATCH v4] " Omar Sandoval
  0 siblings, 1 reply; 15+ messages in thread
From: Omar Sandoval @ 2015-02-09 21:28 UTC (permalink / raw)
  To: Chris J Arges
  Cc: Theodore Ts'o, Andreas Dilger, Lukáš Czerner,
	linux-ext4, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3332 bytes --]

On Mon, Feb 09, 2015 at 03:03:56PM -0600, Chris J Arges wrote:
> On 02/09/2015 12:21 PM, Chris J Arges wrote:
> > On 02/08/2015 06:15 AM, Omar Sandoval wrote:
> >> Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
> >> mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
> >> mapping. However, there are a bugs in a few cases.
> >>
> >> In the case where the punch happens within one level of indirection, we
> >> expect the start and end shared branches to converge on an indirect
> >> block. However, because the branches returned from ext4_find_shared do
> >> not necessarily start at the same level (e.g., the partial2 chain will
> >> be shallower if the last block occurs at the beginning of an indirect
> >> group), the walk of the two chains can end up "missing" each other and
> >> freeing a bunch of extra blocks in the process. This mismatch can be
> >> handled by first making sure that the chains are at the same level, then
> >> walking them together until they converge.
> >>
> >> In the case that a punch spans different levels of indirection, the
> >> original code skips freeing the intermediate indirect trees if the last
> >> block is the first triply-indirected block because it returns instead of
> >> jumping to do_indirects. Additionally, a non-zero nr2 does not mean that
> >> there's nothing else to free at the level of partial2: consider the case
> >> where the all_zeroes in ext4_find_shared backed up the shared branch.
> >>
> >> Signed-off-by: Omar Sandoval <osandov@osandov.com>
> > 
> > Omar,
> > With this patch I no longer seem to be getting the original corruption I
> > detected with my test case; however eventually I do get errors when
> > trying to delete qcow2 snapshots. After getting these errors if I run
> > 'qemu-img check <image>' I see the following errors:
> > 
> > ERROR OFLAG_COPIED data cluster: l2_entry=800000018f7f0000 refcount=0
> > ERROR OFLAG_COPIED data cluster: l2_entry=800000018f800000 refcount=0
> > ERROR OFLAG_COPIED data cluster: l2_entry=800000018f810000 refcount=0
> > 
> > 16941 errors were found on the image.
> > Data may be corrupted, or further writes to the image may corrupt it.
> > 
> > 60459 leaked clusters were found on the image.
> > This means waste of disk space, but no harm to data.
> > 88629/262144 = 33.81% allocated, 9.57% fragmented, 0.00% compressed clusters
> > Image end offset: 10438180864
> > 
> > So this patch seems to have moved the problem. I can collect additional
> > logs if necessary.
> > 
> > Thanks,
> > --chris j arges
> > 
> 
> After ignoring snapshot deletion errors, I've hit the original
> corruption problem with your patch still. I'll continue debugging this.
> --chris j arges
> 
Chris,

Thanks for testing, and sorry about this game of whack-a-mole. Looks
like there's at least one more bug in the n == n2 case (if nr != 0, we
sometimes, but not always, need to free the subtree referred to by it.)
I'll send you another patch as soon as I have the proper fix.

P.S. The fpunch xfstests don't cover these bugs. I'm attaching the
slightly convoluted script I've been using for testing. It's a Python
script that generates a shell script which I then run on the test
machine. The interesting stuff is in known_bugs(), the first three of
which are fixed by the last patch I sent out.

-- 
Omar

[-- Attachment #2: punchtests.py --]
[-- Type: text/x-python, Size: 6790 bytes --]

#!/usr/bin/env python3

import hashlib

NDIR = 12
BLOCK_SIZE = 1024
ADDRS = (BLOCK_SIZE // 4)

ALLOCED_BLOCK = b'\xcd' * BLOCK_SIZE
HOLE_BLOCK = b'\x00' * BLOCK_SIZE

TEST_CASE = 0

def test_case(file_blocks, *args):
    global TEST_CASE
    test_file = '/mnt/test/test.%d' % TEST_CASE
    TEST_CASE += 1

    print('rm -f %s' % test_file)
    print("xfs_io -f -c 'pwrite 0 %d' %s > /dev/null" % (file_blocks * BLOCK_SIZE, test_file))

    blocks = [True] * file_blocks
    for arg in args:
        punch_start, punch_end = arg
        offset = punch_start * BLOCK_SIZE
        length = punch_end * BLOCK_SIZE - offset
        print("xfs_io -c 'fpunch %d %d' %s" % (offset, length, test_file))
        for i in range(punch_start, punch_end):
            blocks[i] = False

    print('umount /mnt/test')
    print('mount /mnt/test')

    print('diff - <(_fiemap %s) << "EOF"' % test_file)

    m = hashlib.md5()
    num = 0
    extent_start = None
    extent_type = None
    for i, alloced in enumerate(blocks):
        m.update(ALLOCED_BLOCK if alloced else HOLE_BLOCK)
        if alloced != extent_type:
            if extent_type is not None:
                start = extent_start * (BLOCK_SIZE // 512)
                end = i * (BLOCK_SIZE // 512) - 1
                print('%d: [%d..%d]: %s' % (num, start, end, 'extent' if extent_type else 'hole'))
                num += 1
            extent_start = i
            extent_type = alloced
    if extent_type is not None:
        if extent_start > 0 or extent_type:
            start = extent_start * (BLOCK_SIZE // 512)
            end = len(blocks) * (BLOCK_SIZE // 512) - 1
            print('%d: [%d..%d]: %s' % (num, start, end, 'extent' if extent_type else 'hole'))
    print('EOF')

    print('diff - <(md5sum %s) << "EOF"' % test_file)
    print('%s  %s' % (m.hexdigest(), test_file))
    print('EOF')
    print()

def simple_tests():
    # Punch within direct level.
    print("echo 'n = 1, n2 = 1'")
    test_case(NDIR, (2, NDIR - 2))
    test_case(NDIR, (0, NDIR // 2))

    # Punch from direct level to starts of indirect levels.
    print("echo 'n2 > n'")
    print("echo '  n = 1'")
    test_case(NDIR + 1, (0, NDIR)) # Start of direct.
    test_case(NDIR + ADDRS + 1, (0, NDIR + ADDRS))
    test_case(NDIR + ADDRS + ADDRS * ADDRS + 1, (0, NDIR + ADDRS + ADDRS * ADDRS))
    test_case(NDIR + 1, (2, NDIR)) # Middle of direct.
    test_case(NDIR + ADDRS + 1, (2, NDIR + ADDRS))
    test_case(NDIR + ADDRS + ADDRS * ADDRS + 1, (2, NDIR + ADDRS + ADDRS * ADDRS))

    # Punch from direct level into indirect levels.
    test_case(NDIR + 2, (0, NDIR + 1)) # Start of direct.
    test_case(NDIR + ADDRS + 2, (0, NDIR + ADDRS + 1))
    test_case(NDIR + ADDRS + ADDRS * ADDRS + 2, (0, NDIR + ADDRS + ADDRS * ADDRS + 1))
    test_case(NDIR + 2, (2, NDIR + 1)) # Middle of direct
    test_case(NDIR + ADDRS + 2, (2, NDIR + ADDRS + 1))
    test_case(NDIR + ADDRS + ADDRS * ADDRS + 2, (2, NDIR + ADDRS + ADDRS * ADDRS + 1))

    # Punch from indirect level into another indirect level.
    print("echo '  n = 2'")
    # IND -> DIND, IND -> TIND, start of end.
    test_case(NDIR + ADDRS + 1, (NDIR, NDIR + ADDRS)) # Start of indirect.
    test_case(NDIR + ADDRS + ADDRS * ADDRS + 1, (NDIR, NDIR + ADDRS + ADDRS * ADDRS))
    test_case(NDIR + ADDRS + 1, (NDIR + 2, NDIR + ADDRS)) # Middle of indirect.
    test_case(NDIR + ADDRS + ADDRS * ADDRS + 1, (NDIR + 2, NDIR + ADDRS + ADDRS * ADDRS))

    # Middle of end.
    test_case(NDIR + ADDRS + 2, (NDIR, NDIR + ADDRS + 1)) # Start of indirect.
    test_case(NDIR + ADDRS + ADDRS * ADDRS + 2, (NDIR, NDIR + ADDRS + ADDRS * ADDRS + 1))
    test_case(NDIR + ADDRS + 2, (NDIR + 2, NDIR + ADDRS + 1)) # Middle of indirect.
    test_case(NDIR + ADDRS + ADDRS * ADDRS + 2, (NDIR + 2, NDIR + ADDRS + ADDRS * ADDRS + 1))


    # DIND -> TIND
    print("echo '  n = 3'")
    test_case(NDIR + ADDRS + ADDRS * ADDRS + 1, (NDIR + ADDRS, NDIR + ADDRS + ADDRS * ADDRS)) # Start of indirect.
    test_case(NDIR + ADDRS + ADDRS * ADDRS + 1, (NDIR + ADDRS + 2, NDIR + ADDRS + ADDRS * ADDRS))
    test_case(NDIR + ADDRS + ADDRS * ADDRS + 2, (NDIR + ADDRS, NDIR + ADDRS + ADDRS * ADDRS + 1)) # Middle of indirect.
    test_case(NDIR + ADDRS + ADDRS * ADDRS + 2, (NDIR + ADDRS + 2, NDIR + ADDRS + ADDRS * ADDRS + 1))

    # Punch within the same indirect level.
    print("echo 'n = 2, n2 = 2'")
    # IND
    test_case(NDIR + ADDRS, (NDIR, NDIR + ADDRS - 1))
    test_case(NDIR + ADDRS, (NDIR + 4, NDIR + ADDRS - 1))

    # DIND
    print("echo 'n = 3, n2 = 3'")
    test_case(NDIR + ADDRS + ADDRS * ADDRS, (NDIR + ADDRS, NDIR + ADDRS + ADDRS * ADDRS - 1))
    test_case(NDIR + ADDRS + ADDRS * ADDRS, (NDIR + ADDRS + 4, NDIR + ADDRS + ADDRS * ADDRS - 1))

    # TIND
    print("echo 'n = 4, n2 = 4'")
    test_case(NDIR + ADDRS + ADDRS * ADDRS + 20, (NDIR + ADDRS + ADDRS * ADDRS, NDIR + ADDRS + ADDRS * ADDRS + 20))
    test_case(NDIR + ADDRS + ADDRS * ADDRS + 20, (NDIR + ADDRS + ADDRS * ADDRS + 4, NDIR + ADDRS + ADDRS * ADDRS + 20))
    test_case(NDIR + ADDRS + ADDRS * ADDRS + 20, (NDIR + ADDRS + ADDRS * ADDRS, NDIR + ADDRS + ADDRS * ADDRS + 19))
    test_case(NDIR + ADDRS + ADDRS * ADDRS + 20, (NDIR + ADDRS + ADDRS * ADDRS + 4, NDIR + ADDRS + ADDRS * ADDRS + 19))

def known_bugs():
    print("echo 'n == n2, uneven partials'")
    test_case(NDIR + ADDRS + 2 * ADDRS, (NDIR + ADDRS + ADDRS // 2, NDIR + ADDRS + ADDRS))

    print("echo 'n <= 2, n2 == 4, end is first triply-indirect block'")
    test_case(NDIR + ADDRS + ADDRS * ADDRS + 4, (0, NDIR + ADDRS + ADDRS * ADDRS))

    print("echo 'n < n2, nr2 != 0 because all_zeroes'")
    test_case(NDIR + ADDRS + 2 * ADDRS + 5, (NDIR + ADDRS + 2 * ADDRS, NDIR + ADDRS + 2 * ADDRS + 4),
                                            (0, NDIR + ADDRS + 2 * ADDRS + 4))

    print("echo 'n == n2, nr != 0 because all_zeroes'")
    test_case(NDIR + ADDRS + 4 * ADDRS, (NDIR + ADDRS + ADDRS, NDIR + ADDRS + ADDRS + 1),
                                        (NDIR + ADDRS + ADDRS + 1, NDIR + ADDRS + 3 * ADDRS))

if __name__ == '__main__':
    print('#!/bin/bash')

    print(r"""
_coalesce_extents () {
	awk -F: '
	{
		range = $2;
		type = $3;

		split(range, bounds, "[\\[ \\.\\]]");
		low = bounds[3];
		high = bounds[5];

		if (type != prev_type) {
			if (prev_type != "")
				printf("%u]:%s\n", low - 1, prev_type);
			printf("%u: [%u..", out_count++, low);
			prev_type = type;
		}
	}
	END {
		if (prev_type != "")
			printf("%u]:%s\n", high, prev_type);
	}'
}

_filter_hole_fiemap () {
	awk --posix '
		$3 ~ /hole/ {
			print $1, $2, $3; 
			next;
		}   
		$5 ~ /0x[[:xdigit:]]+/ {
			print $1, $2, "extent";
		}' |
	_coalesce_extents
}

_fiemap () {
    xfs_io -c 'fiemap -v' "$1" | _filter_hole_fiemap
}
""")

    # simple_tests()
    known_bugs()

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4] ext4: fix indirect punch hole corruption
  2015-02-09 21:28               ` Omar Sandoval
@ 2015-02-10 21:44                 ` Omar Sandoval
  2015-02-11  2:59                   ` Chris J Arges
  0 siblings, 1 reply; 15+ messages in thread
From: Omar Sandoval @ 2015-02-10 21:44 UTC (permalink / raw)
  To: Chris J Arges, Theodore Ts'o, Andreas Dilger,
	Lukáš Czerner
  Cc: linux-ext4, linux-kernel, Omar Sandoval

Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
mapping. However, there are bugs in several corner cases. This fixes 5
distinct bugs:

1. When there is at least one entire level of indirection between the
start and end of the punch range and the end of the punch range is the
first block of its level, we can't return early; we have to free the
intervening levels.

2. When the end is at a higher level of indirection than the start and
ext4_find_shared returns a top branch for the end, we still need to free
the rest of the shared branch it returns; we can't decrement partial2.

3. When a punch happens within one level of indirection, we need to
converge on an indirect block that contains the start and end. However,
because the branches returned from ext4_find_shared do not necessarily
start at the same level (e.g., the partial2 chain will be shallower if
the last block occurs at the beginning of an indirect group), the walk
of the two chains can end up "missing" each other and freeing a bunch of
extra blocks in the process. This mismatch can be handled by first
making sure that the chains are at the same level, then walking them
together until they converge.

4. When the punch happens within one level of indirection and
ext4_find_shared returns a top branch for the start, we must free it,
but only if the end does not occur within that branch.

5. When the punch happens within one level of indirection and
ext4_find_shared returns a top branch for the end, then we shouldn't
free the block referenced by the end of the returned chain (this mirrors
the different levels case).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
Okay, two more bugfixes folded in, all described in the commit message.
I'm finally no longer seeing xfstest generic/270 cause corruptions, even
after running it overnight, so hopefully this is it. Chris, would you
mind trying this out?

 fs/ext4/indirect.c | 105 ++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 71 insertions(+), 34 deletions(-)

diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 36b3696..5e7af1c 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -1393,10 +1393,7 @@ end_range:
 				 * to free. Everything was covered by the start
 				 * of the range.
 				 */
-				return 0;
-			} else {
-				/* Shared branch grows from an indirect block */
-				partial2--;
+				goto do_indirects;
 			}
 		} else {
 			/*
@@ -1427,56 +1424,96 @@ end_range:
 	/* Punch happened within the same level (n == n2) */
 	partial = ext4_find_shared(inode, n, offsets, chain, &nr);
 	partial2 = ext4_find_shared(inode, n2, offsets2, chain2, &nr2);
-	/*
-	 * ext4_find_shared returns Indirect structure which
-	 * points to the last element which should not be
-	 * removed by truncate. But this is end of the range
-	 * in punch_hole so we need to point to the next element
-	 */
-	partial2->p++;
-	while ((partial > chain) || (partial2 > chain2)) {
-		/* We're at the same block, so we're almost finished */
-		if ((partial->bh && partial2->bh) &&
-		    (partial->bh->b_blocknr == partial2->bh->b_blocknr)) {
-			if ((partial > chain) && (partial2 > chain2)) {
+
+	/* Free top, but only if partial2 isn't its subtree. */
+	if (nr) {
+		int level = min(partial - chain, partial2 - chain2);
+		int i;
+		int subtree = 1;
+
+		for (i = 0; i <= level; i++) {
+			if (offsets[i] != offsets2[i]) {
+				subtree = 0;
+				break;
+			}
+		}
+
+		if (!subtree) {
+			if (partial == chain) {
+				/* Shared branch grows from the inode */
+				ext4_free_branches(handle, inode, NULL,
+						   &nr, &nr+1,
+						   (chain+n-1) - partial);
+				*partial->p = 0;
+			} else {
+				/* Shared branch grows from an indirect block */
+				BUFFER_TRACE(partial->bh, "get_write_access");
 				ext4_free_branches(handle, inode, partial->bh,
-						   partial->p + 1,
-						   partial2->p,
+						   partial->p,
+						   partial->p+1,
 						   (chain+n-1) - partial);
-				BUFFER_TRACE(partial->bh, "call brelse");
-				brelse(partial->bh);
-				BUFFER_TRACE(partial2->bh, "call brelse");
-				brelse(partial2->bh);
 			}
-			return 0;
 		}
+	}
+
+	if (!nr2) {
 		/*
-		 * Clear the ends of indirect blocks on the shared branch
-		 * at the start of the range
+		 * ext4_find_shared returns Indirect structure which
+		 * points to the last element which should not be
+		 * removed by truncate. But this is end of the range
+		 * in punch_hole so we need to point to the next element
 		 */
-		if (partial > chain) {
+		partial2->p++;
+	}
+
+	while (partial > chain || partial2 > chain2) {
+		int depth = (chain+n-1) - partial;
+		int depth2 = (chain2+n2-1) - partial2;
+
+		if (partial > chain && partial2 > chain2 &&
+		    partial->bh->b_blocknr == partial2->bh->b_blocknr) {
+			/*
+			 * We've converged on the same block. Clear the range,
+			 * then we're done.
+			 */
 			ext4_free_branches(handle, inode, partial->bh,
-				   partial->p + 1,
-				   (__le32 *)partial->bh->b_data+addr_per_block,
-				   (chain+n-1) - partial);
+					   partial->p + 1,
+					   partial2->p,
+					   (chain+n-1) - partial);
 			BUFFER_TRACE(partial->bh, "call brelse");
 			brelse(partial->bh);
-			partial--;
+			BUFFER_TRACE(partial2->bh, "call brelse");
+			brelse(partial2->bh);
+			return 0;
 		}
+
 		/*
-		 * Clear the ends of indirect blocks on the shared branch
-		 * at the end of the range
+		 * The start and end partial branches may not be at the same
+		 * level even though the punch happened within one level. So, we
+		 * give them a chance to arrive at the same level, then walk
+		 * them in step with each other until we converge on the same
+		 * block.
 		 */
-		if (partial2 > chain2) {
+		if (partial > chain && depth <= depth2) {
+			ext4_free_branches(handle, inode, partial->bh,
+					   partial->p + 1,
+					   (__le32 *)partial->bh->b_data+addr_per_block,
+					   (chain+n-1) - partial);
+			BUFFER_TRACE(partial->bh, "call brelse");
+			brelse(partial->bh);
+			partial--;
+		}
+		if (partial2 > chain2 && depth2 <= depth) {
 			ext4_free_branches(handle, inode, partial2->bh,
 					   (__le32 *)partial2->bh->b_data,
 					   partial2->p,
-					   (chain2+n-1) - partial2);
+					   (chain2+n2-1) - partial2);
 			BUFFER_TRACE(partial2->bh, "call brelse");
 			brelse(partial2->bh);
 			partial2--;
 		}
 	}
+	return 0;
 
 do_indirects:
 	/* Kill the remaining (whole) subtrees */
-- 
2.3.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] ext4: fix indirect punch hole corruption
  2015-02-10 21:44                 ` [PATCH v4] " Omar Sandoval
@ 2015-02-11  2:59                   ` Chris J Arges
  2015-02-11  3:37                     ` Omar Sandoval
  0 siblings, 1 reply; 15+ messages in thread
From: Chris J Arges @ 2015-02-11  2:59 UTC (permalink / raw)
  To: Omar Sandoval, Theodore Ts'o, Andreas Dilger,
	Lukáš Czerner
  Cc: linux-ext4, linux-kernel

On 02/10/2015 03:44 PM, Omar Sandoval wrote:
> Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
> mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
> mapping. However, there are bugs in several corner cases. This fixes 5
> distinct bugs:
> 
> 1. When there is at least one entire level of indirection between the
> start and end of the punch range and the end of the punch range is the
> first block of its level, we can't return early; we have to free the
> intervening levels.
> 
> 2. When the end is at a higher level of indirection than the start and
> ext4_find_shared returns a top branch for the end, we still need to free
> the rest of the shared branch it returns; we can't decrement partial2.
> 
> 3. When a punch happens within one level of indirection, we need to
> converge on an indirect block that contains the start and end. However,
> because the branches returned from ext4_find_shared do not necessarily
> start at the same level (e.g., the partial2 chain will be shallower if
> the last block occurs at the beginning of an indirect group), the walk
> of the two chains can end up "missing" each other and freeing a bunch of
> extra blocks in the process. This mismatch can be handled by first
> making sure that the chains are at the same level, then walking them
> together until they converge.
> 
> 4. When the punch happens within one level of indirection and
> ext4_find_shared returns a top branch for the start, we must free it,
> but only if the end does not occur within that branch.
> 
> 5. When the punch happens within one level of indirection and
> ext4_find_shared returns a top branch for the end, then we shouldn't
> free the block referenced by the end of the returned chain (this mirrors
> the different levels case).
> 
> Signed-off-by: Omar Sandoval <osandov@osandov.com>
> ---
> Okay, two more bugfixes folded in, all described in the commit message.
> I'm finally no longer seeing xfstest generic/270 cause corruptions, even
> after running it overnight, so hopefully this is it. Chris, would you
> mind trying this out?
>

Omar,
I've completed 80 iterations of this patch so far without failure!
Normally failures have occurred between 2-15 runs. Great job, and thanks
for your persistence in fixing this issue!

Tested-by: Chris J Arges <chris.j.arges@canonical.com>


>  fs/ext4/indirect.c | 105 ++++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 71 insertions(+), 34 deletions(-)
> 
> diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
> index 36b3696..5e7af1c 100644
> --- a/fs/ext4/indirect.c
> +++ b/fs/ext4/indirect.c
> @@ -1393,10 +1393,7 @@ end_range:
>  				 * to free. Everything was covered by the start
>  				 * of the range.
>  				 */
> -				return 0;
> -			} else {
> -				/* Shared branch grows from an indirect block */
> -				partial2--;
> +				goto do_indirects;
>  			}
>  		} else {
>  			/*
> @@ -1427,56 +1424,96 @@ end_range:
>  	/* Punch happened within the same level (n == n2) */
>  	partial = ext4_find_shared(inode, n, offsets, chain, &nr);
>  	partial2 = ext4_find_shared(inode, n2, offsets2, chain2, &nr2);
> -	/*
> -	 * ext4_find_shared returns Indirect structure which
> -	 * points to the last element which should not be
> -	 * removed by truncate. But this is end of the range
> -	 * in punch_hole so we need to point to the next element
> -	 */
> -	partial2->p++;
> -	while ((partial > chain) || (partial2 > chain2)) {
> -		/* We're at the same block, so we're almost finished */
> -		if ((partial->bh && partial2->bh) &&
> -		    (partial->bh->b_blocknr == partial2->bh->b_blocknr)) {
> -			if ((partial > chain) && (partial2 > chain2)) {
> +
> +	/* Free top, but only if partial2 isn't its subtree. */
> +	if (nr) {
> +		int level = min(partial - chain, partial2 - chain2);
> +		int i;
> +		int subtree = 1;
> +
> +		for (i = 0; i <= level; i++) {
> +			if (offsets[i] != offsets2[i]) {
> +				subtree = 0;
> +				break;
> +			}
> +		}
> +
> +		if (!subtree) {
> +			if (partial == chain) {
> +				/* Shared branch grows from the inode */
> +				ext4_free_branches(handle, inode, NULL,
> +						   &nr, &nr+1,
> +						   (chain+n-1) - partial);
> +				*partial->p = 0;
> +			} else {
> +				/* Shared branch grows from an indirect block */
> +				BUFFER_TRACE(partial->bh, "get_write_access");
>  				ext4_free_branches(handle, inode, partial->bh,
> -						   partial->p + 1,
> -						   partial2->p,
> +						   partial->p,
> +						   partial->p+1,
>  						   (chain+n-1) - partial);
> -				BUFFER_TRACE(partial->bh, "call brelse");
> -				brelse(partial->bh);
> -				BUFFER_TRACE(partial2->bh, "call brelse");
> -				brelse(partial2->bh);
>  			}
> -			return 0;
>  		}
> +	}
> +
> +	if (!nr2) {
>  		/*
> -		 * Clear the ends of indirect blocks on the shared branch
> -		 * at the start of the range
> +		 * ext4_find_shared returns Indirect structure which
> +		 * points to the last element which should not be
> +		 * removed by truncate. But this is end of the range
> +		 * in punch_hole so we need to point to the next element
>  		 */
> -		if (partial > chain) {
> +		partial2->p++;
> +	}
> +
> +	while (partial > chain || partial2 > chain2) {
> +		int depth = (chain+n-1) - partial;
> +		int depth2 = (chain2+n2-1) - partial2;
> +
> +		if (partial > chain && partial2 > chain2 &&
> +		    partial->bh->b_blocknr == partial2->bh->b_blocknr) {
> +			/*
> +			 * We've converged on the same block. Clear the range,
> +			 * then we're done.
> +			 */
>  			ext4_free_branches(handle, inode, partial->bh,
> -				   partial->p + 1,
> -				   (__le32 *)partial->bh->b_data+addr_per_block,
> -				   (chain+n-1) - partial);
> +					   partial->p + 1,
> +					   partial2->p,
> +					   (chain+n-1) - partial);
>  			BUFFER_TRACE(partial->bh, "call brelse");
>  			brelse(partial->bh);
> -			partial--;
> +			BUFFER_TRACE(partial2->bh, "call brelse");
> +			brelse(partial2->bh);
> +			return 0;
>  		}
> +
>  		/*
> -		 * Clear the ends of indirect blocks on the shared branch
> -		 * at the end of the range
> +		 * The start and end partial branches may not be at the same
> +		 * level even though the punch happened within one level. So, we
> +		 * give them a chance to arrive at the same level, then walk
> +		 * them in step with each other until we converge on the same
> +		 * block.
>  		 */
> -		if (partial2 > chain2) {
> +		if (partial > chain && depth <= depth2) {
> +			ext4_free_branches(handle, inode, partial->bh,
> +					   partial->p + 1,
> +					   (__le32 *)partial->bh->b_data+addr_per_block,
> +					   (chain+n-1) - partial);
> +			BUFFER_TRACE(partial->bh, "call brelse");
> +			brelse(partial->bh);
> +			partial--;
> +		}
> +		if (partial2 > chain2 && depth2 <= depth) {
>  			ext4_free_branches(handle, inode, partial2->bh,
>  					   (__le32 *)partial2->bh->b_data,
>  					   partial2->p,
> -					   (chain2+n-1) - partial2);
> +					   (chain2+n2-1) - partial2);
>  			BUFFER_TRACE(partial2->bh, "call brelse");
>  			brelse(partial2->bh);
>  			partial2--;
>  		}
>  	}
> +	return 0;
>  
>  do_indirects:
>  	/* Kill the remaining (whole) subtrees */
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] ext4: fix indirect punch hole corruption
  2015-02-11  2:59                   ` Chris J Arges
@ 2015-02-11  3:37                     ` Omar Sandoval
  2015-02-17 18:59                       ` Omar Sandoval
  0 siblings, 1 reply; 15+ messages in thread
From: Omar Sandoval @ 2015-02-11  3:37 UTC (permalink / raw)
  To: Chris J Arges
  Cc: Theodore Ts'o, Andreas Dilger, Lukáš Czerner,
	linux-ext4, linux-kernel

On Tue, Feb 10, 2015 at 08:59:23PM -0600, Chris J Arges wrote:
> On 02/10/2015 03:44 PM, Omar Sandoval wrote:
> > Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
> > mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
> > mapping. However, there are bugs in several corner cases. This fixes 5
> > distinct bugs:
> > 
> > 1. When there is at least one entire level of indirection between the
> > start and end of the punch range and the end of the punch range is the
> > first block of its level, we can't return early; we have to free the
> > intervening levels.
> > 
> > 2. When the end is at a higher level of indirection than the start and
> > ext4_find_shared returns a top branch for the end, we still need to free
> > the rest of the shared branch it returns; we can't decrement partial2.
> > 
> > 3. When a punch happens within one level of indirection, we need to
> > converge on an indirect block that contains the start and end. However,
> > because the branches returned from ext4_find_shared do not necessarily
> > start at the same level (e.g., the partial2 chain will be shallower if
> > the last block occurs at the beginning of an indirect group), the walk
> > of the two chains can end up "missing" each other and freeing a bunch of
> > extra blocks in the process. This mismatch can be handled by first
> > making sure that the chains are at the same level, then walking them
> > together until they converge.
> > 
> > 4. When the punch happens within one level of indirection and
> > ext4_find_shared returns a top branch for the start, we must free it,
> > but only if the end does not occur within that branch.
> > 
> > 5. When the punch happens within one level of indirection and
> > ext4_find_shared returns a top branch for the end, then we shouldn't
> > free the block referenced by the end of the returned chain (this mirrors
> > the different levels case).
> > 
> > Signed-off-by: Omar Sandoval <osandov@osandov.com>
> > ---
> > Okay, two more bugfixes folded in, all described in the commit message.
> > I'm finally no longer seeing xfstest generic/270 cause corruptions, even
> > after running it overnight, so hopefully this is it. Chris, would you
> > mind trying this out?
> >
> 
> Omar,
> I've completed 80 iterations of this patch so far without failure!
> Normally failures have occurred between 2-15 runs. Great job, and thanks
> for your persistence in fixing this issue!
> 
> Tested-by: Chris J Arges <chris.j.arges@canonical.com>
> 

Awesome, I was starting to run out of ideas ;) Thanks for all of your
testing.

Lukáš, would you like to take a look at this?

Also, Ted and Andreas, would you prefer this all in one patch, or should
I split out each individual fix into its own patch?

Thanks!
-- 
Omar

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] ext4: fix indirect punch hole corruption
  2015-02-11  3:37                     ` Omar Sandoval
@ 2015-02-17 18:59                       ` Omar Sandoval
  0 siblings, 0 replies; 15+ messages in thread
From: Omar Sandoval @ 2015-02-17 18:59 UTC (permalink / raw)
  To: Theodore Ts'o, Andreas Dilger, Lukáš Czerner
  Cc: Theodore Ts'o, Andreas Dilger, Lukáš Czerner,
	linux-ext4, linux-kernel

On Tue, Feb 10, 2015 at 07:37:20PM -0800, Omar Sandoval wrote:
> On Tue, Feb 10, 2015 at 08:59:23PM -0600, Chris J Arges wrote:
> > On 02/10/2015 03:44 PM, Omar Sandoval wrote:
> > > Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
> > > mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
> > > mapping. However, there are bugs in several corner cases. This fixes 5
> > > distinct bugs:
> > > 
> > > 1. When there is at least one entire level of indirection between the
> > > start and end of the punch range and the end of the punch range is the
> > > first block of its level, we can't return early; we have to free the
> > > intervening levels.
> > > 
> > > 2. When the end is at a higher level of indirection than the start and
> > > ext4_find_shared returns a top branch for the end, we still need to free
> > > the rest of the shared branch it returns; we can't decrement partial2.
> > > 
> > > 3. When a punch happens within one level of indirection, we need to
> > > converge on an indirect block that contains the start and end. However,
> > > because the branches returned from ext4_find_shared do not necessarily
> > > start at the same level (e.g., the partial2 chain will be shallower if
> > > the last block occurs at the beginning of an indirect group), the walk
> > > of the two chains can end up "missing" each other and freeing a bunch of
> > > extra blocks in the process. This mismatch can be handled by first
> > > making sure that the chains are at the same level, then walking them
> > > together until they converge.
> > > 
> > > 4. When the punch happens within one level of indirection and
> > > ext4_find_shared returns a top branch for the start, we must free it,
> > > but only if the end does not occur within that branch.
> > > 
> > > 5. When the punch happens within one level of indirection and
> > > ext4_find_shared returns a top branch for the end, then we shouldn't
> > > free the block referenced by the end of the returned chain (this mirrors
> > > the different levels case).
> > > 
> > > Signed-off-by: Omar Sandoval <osandov@osandov.com>
> > > ---
> > > Okay, two more bugfixes folded in, all described in the commit message.
> > > I'm finally no longer seeing xfstest generic/270 cause corruptions, even
> > > after running it overnight, so hopefully this is it. Chris, would you
> > > mind trying this out?
> > >
> > 
> > Omar,
> > I've completed 80 iterations of this patch so far without failure!
> > Normally failures have occurred between 2-15 runs. Great job, and thanks
> > for your persistence in fixing this issue!
> > 
> > Tested-by: Chris J Arges <chris.j.arges@canonical.com>
> > 
> 
> Awesome, I was starting to run out of ideas ;) Thanks for all of your
> testing.
> 
> Lukáš, would you like to take a look at this?
> 
> Also, Ted and Andreas, would you prefer this all in one patch, or should
> I split out each individual fix into its own patch?
> 
> Thanks!
> -- 
> Omar

Hi,

I figure things are busy because of the merge window, but I wanted to
check on the status of this patch. I also have some regression tests for
xfstests ready to go.

Thanks!
-- 
Omar

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-02-17 18:59 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-05 20:50 [PATCH] ext4: fix indirect punch hole corruption Omar Sandoval
2015-02-05 20:56 ` Omar Sandoval
2015-02-05 21:30 ` Chris J Arges
2015-02-05 21:41   ` Omar Sandoval
2015-02-07  0:28   ` Omar Sandoval
2015-02-07  0:35     ` Chris J Arges
2015-02-07 21:57       ` [PATCH v2] " Omar Sandoval
2015-02-08 12:15         ` [PATCH v3] " Omar Sandoval
2015-02-09 18:21           ` Chris J Arges
2015-02-09 21:03             ` Chris J Arges
2015-02-09 21:28               ` Omar Sandoval
2015-02-10 21:44                 ` [PATCH v4] " Omar Sandoval
2015-02-11  2:59                   ` Chris J Arges
2015-02-11  3:37                     ` Omar Sandoval
2015-02-17 18:59                       ` Omar Sandoval

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).