LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
To: dsterba@suse.cz, clm@fb.com, josef@toxicpanda.com,
	dsterba@suse.com, anand.jain@oracle.com,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	skhan@linuxfoundation.org, gregkh@linuxfoundation.org,
	linux-kernel-mentees@lists.linuxfoundation.org,
	syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com
Subject: Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids
Date: Thu, 12 Aug 2021 23:43:16 +0800	[thread overview]
Message-ID: <3c48eec9-590c-4974-4026-f74cafa5ac48@gmail.com> (raw)
In-Reply-To: <20210812103851.GC5047@twin.jikos.cz>

On 12/8/21 6:38 pm, David Sterba wrote:
> On Tue, Jul 27, 2021 at 03:13:03PM +0800, Desmond Cheong Zhi Xi wrote:
>> When removing a writeable device in __btrfs_free_extra_devids, the rw
>> device count should be decremented.
>>
>> This error was caught by Syzbot which reported a warning in
>> close_fs_devices because fs_devices->rw_devices was not 0 after
>> closing all devices. Here is the call trace that was observed:
>>
>>    btrfs_mount_root():
>>      btrfs_scan_one_device():
>>        device_list_add();   <---------------- device added
>>      btrfs_open_devices():
>>        open_fs_devices():
>>          btrfs_open_one_device();   <-------- writable device opened,
>> 	                                     rw device count ++
>>      btrfs_fill_super():
>>        open_ctree():
>>          btrfs_free_extra_devids():
>> 	  __btrfs_free_extra_devids();  <--- writable device removed,
>> 	                              rw device count not decremented
>> 	  fail_tree_roots:
>> 	    btrfs_close_devices():
>> 	      close_fs_devices();   <------- rw device count off by 1
>>
>> As a note, prior to commit cf89af146b7e ("btrfs: dev-replace: fail
>> mount if we don't have replace item with target device"), rw_devices
>> was decremented on removing a writable device in
>> __btrfs_free_extra_devids only if the BTRFS_DEV_STATE_REPLACE_TGT bit
>> was not set for the device. However, this check does not need to be
>> reinstated as it is now redundant and incorrect.
>>
>> In __btrfs_free_extra_devids, we skip removing the device if it is the
>> target for replacement. This is done by checking whether device->devid
>> == BTRFS_DEV_REPLACE_DEVID. Since BTRFS_DEV_STATE_REPLACE_TGT is set
>> only on the device with devid BTRFS_DEV_REPLACE_DEVID, no devices
>> should have the BTRFS_DEV_STATE_REPLACE_TGT bit set after the check,
>> and so it's redundant to test for that bit.
>>
>> Additionally, following commit 82372bc816d7 ("Btrfs: make
>> the logic of source device removing more clear"), rw_devices is
>> incremented whenever a writeable device is added to the alloc
>> list (including the target device in btrfs_dev_replace_finishing), so
>> all removals of writable devices from the alloc list should also be
>> accompanied by a decrement to rw_devices.
>>
>> Fixes: cf89af146b7e ("btrfs: dev-replace: fail mount if we don't have replace item with target device")
>> Reported-by: syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com
>> Tested-by: syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com
>> Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
>> Reviewed-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   fs/btrfs/volumes.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 807502cd6510..916c25371658 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
>>   		if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
>>   			list_del_init(&device->dev_alloc_list);
>>   			clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
>> +			fs_devices->rw_devices--;
>>   		}
>>   		list_del_init(&device->dev_list);
>>   		fs_devices->num_devices--;
> 
> I've hit a crash on master branch with stacktrace very similar to one
> this bug was supposed to fix. It's a failed assertion on device close.
> This patch was the last one to touch it and it matches some of the
> keywords, namely the BTRFS_DEV_STATE_REPLACE_TGT bit that used to be in
> the original patch but was not reinstated in your fix.
> 
> I'm not sure how reproducible it is, right now I have only one instance
> and am hunting another strange problem. They could be related.
> 
> assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150
> 
> https://susepaste.org/view/raw/18223056 full log with other stacktraces,
> possibly relatedg
> 

Looking at the logs, it seems that a dev_replace was started, then 
suspended. But it wasn't canceled or resumed before the fs devices were 
closed.

I'll investigate further, just throwing some observations out there.

> [ 3310.383268] kernel BUG at fs/btrfs/ctree.h:3431!
> [ 3310.384586] invalid opcode: 0000 [#1] PREEMPT SMP
> [ 3310.385848] CPU: 1 PID: 3982 Comm: umount Tainted: G        W         5.14.0-rc5-default+ #1532
> [ 3310.388216] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
> [ 3310.391054] RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
> [ 3310.397628] RSP: 0018:ffffb7a5454c7db8 EFLAGS: 00010246
> [ 3310.399079] RAX: 0000000000000068 RBX: ffff978364b91c00 RCX: 0000000000000000
> [ 3310.400990] RDX: 0000000000000000 RSI: ffffffffabee13c4 RDI: 00000000ffffffff
> [ 3310.402504] RBP: ffff9783523a4c00 R08: 0000000000000001 R09: 0000000000000001
> [ 3310.404025] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9783523a4d18
> [ 3310.405074] R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000003
> [ 3310.406130] FS:  00007f61c8f42800(0000) GS:ffff9783bd800000(0000) knlGS:0000000000000000
> [ 3310.407649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3310.409022] CR2: 000056190cffa810 CR3: 0000000030b96002 CR4: 0000000000170ea0
> [ 3310.410111] Call Trace:
> [ 3310.410561]  btrfs_close_one_device.cold+0x11/0x55 [btrfs]
> [ 3310.411788]  close_fs_devices+0x44/0xb0 [btrfs]
> [ 3310.412654]  btrfs_close_devices+0x48/0x160 [btrfs]
> [ 3310.413449]  generic_shutdown_super+0x69/0x100
> [ 3310.414155]  kill_anon_super+0x14/0x30
> [ 3310.414802]  btrfs_kill_super+0x12/0x20 [btrfs]
> [ 3310.415767]  deactivate_locked_super+0x2c/0xa0
> [ 3310.416562]  cleanup_mnt+0x144/0x1b0
> [ 3310.417153]  task_work_run+0x59/0xa0
> [ 3310.417744]  exit_to_user_mode_loop+0xe7/0xf0
> [ 3310.418440]  exit_to_user_mode_prepare+0xaf/0xf0
> [ 3310.419425]  syscall_exit_to_user_mode+0x19/0x50
> [ 3310.420281]  do_syscall_64+0x4a/0x90
> [ 3310.420934]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 3310.421718] RIP: 0033:0x7f61c91654db
> 

  reply	other threads:[~2021-08-12 15:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-27  7:13 Desmond Cheong Zhi Xi
2021-07-28 12:58 ` David Sterba
2021-08-12 10:38 ` David Sterba
2021-08-12 15:43   ` Desmond Cheong Zhi Xi [this message]
2021-08-12 15:50     ` David Sterba
2021-08-12 17:31       ` Desmond Cheong Zhi Xi
2021-08-13  8:51         ` David Sterba
2021-08-13  9:57           ` Desmond Cheong Zhi Xi
2021-08-13 10:30             ` David Sterba
2021-08-19 17:11               ` Desmond Cheong Zhi Xi
2021-08-19 17:34                 ` David Sterba
2021-08-20  3:09                   ` Desmond Cheong Zhi Xi
2021-08-20 10:58                     ` David Sterba
2021-08-20 17:53                       ` Desmond Cheong Zhi Xi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3c48eec9-590c-4974-4026-f74cafa5ac48@gmail.com \
    --to=desmondcheongzx@gmail.com \
    --cc=anand.jain@oracle.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=dsterba@suse.cz \
    --cc=gregkh@linuxfoundation.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel-mentees@lists.linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=skhan@linuxfoundation.org \
    --cc=syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com \
    --subject='Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).