LKML Archive on
help / color / mirror / Atom feed
From: Desmond Cheong Zhi Xi <>
Subject: Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids
Date: Fri, 13 Aug 2021 01:31:25 +0800	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On 12/8/21 11:50 pm, David Sterba wrote:
> On Thu, Aug 12, 2021 at 11:43:16PM +0800, Desmond Cheong Zhi Xi wrote:
>> On 12/8/21 6:38 pm, David Sterba wrote:
>>> On Tue, Jul 27, 2021 at 03:13:03PM +0800, Desmond Cheong Zhi Xi wrote:
>>>> When removing a writeable device in __btrfs_free_extra_devids, the rw
>>>> device count should be decremented.
>>>> This error was caught by Syzbot which reported a warning in
>>>> close_fs_devices because fs_devices->rw_devices was not 0 after
>>>> closing all devices. Here is the call trace that was observed:
>>>>     btrfs_mount_root():
>>>>       btrfs_scan_one_device():
>>>>         device_list_add();   <---------------- device added
>>>>       btrfs_open_devices():
>>>>         open_fs_devices():
>>>>           btrfs_open_one_device();   <-------- writable device opened,
>>>> 	                                     rw device count ++
>>>>       btrfs_fill_super():
>>>>         open_ctree():
>>>>           btrfs_free_extra_devids():
>>>> 	  __btrfs_free_extra_devids();  <--- writable device removed,
>>>> 	                              rw device count not decremented
>>>> 	  fail_tree_roots:
>>>> 	    btrfs_close_devices():
>>>> 	      close_fs_devices();   <------- rw device count off by 1
>>>> As a note, prior to commit cf89af146b7e ("btrfs: dev-replace: fail
>>>> mount if we don't have replace item with target device"), rw_devices
>>>> was decremented on removing a writable device in
>>>> __btrfs_free_extra_devids only if the BTRFS_DEV_STATE_REPLACE_TGT bit
>>>> was not set for the device. However, this check does not need to be
>>>> reinstated as it is now redundant and incorrect.
>>>> In __btrfs_free_extra_devids, we skip removing the device if it is the
>>>> target for replacement. This is done by checking whether device->devid
>>>> only on the device with devid BTRFS_DEV_REPLACE_DEVID, no devices
>>>> should have the BTRFS_DEV_STATE_REPLACE_TGT bit set after the check,
>>>> and so it's redundant to test for that bit.
>>>> Additionally, following commit 82372bc816d7 ("Btrfs: make
>>>> the logic of source device removing more clear"), rw_devices is
>>>> incremented whenever a writeable device is added to the alloc
>>>> list (including the target device in btrfs_dev_replace_finishing), so
>>>> all removals of writable devices from the alloc list should also be
>>>> accompanied by a decrement to rw_devices.
>>>> Fixes: cf89af146b7e ("btrfs: dev-replace: fail mount if we don't have replace item with target device")
>>>> Reported-by:
>>>> Tested-by:
>>>> Signed-off-by: Desmond Cheong Zhi Xi <>
>>>> Reviewed-by: Anand Jain <>
>>>> ---
>>>>    fs/btrfs/volumes.c | 1 +
>>>>    1 file changed, 1 insertion(+)
>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>> index 807502cd6510..916c25371658 100644
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
>>>>    		if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
>>>>    			list_del_init(&device->dev_alloc_list);
>>>>    			clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
>>>> +			fs_devices->rw_devices--;
>>>>    		}
>>>>    		list_del_init(&device->dev_list);
>>>>    		fs_devices->num_devices--;
>>> I've hit a crash on master branch with stacktrace very similar to one
>>> this bug was supposed to fix. It's a failed assertion on device close.
>>> This patch was the last one to touch it and it matches some of the
>>> keywords, namely the BTRFS_DEV_STATE_REPLACE_TGT bit that used to be in
>>> the original patch but was not reinstated in your fix.
>>> I'm not sure how reproducible it is, right now I have only one instance
>>> and am hunting another strange problem. They could be related.
>>> assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150
>>> full log with other stacktraces,
>>> possibly relatedg
>> Looking at the logs, it seems that a dev_replace was started, then
>> suspended. But it wasn't canceled or resumed before the fs devices were
>> closed.
>> I'll investigate further, just throwing some observations out there.
> Thanks. I'm testing the patch revert, no crash after first loop, I'll
> run a few more to be sure as it's not entirely reliable.
> Sending the revert is option of last resort as we're approaching end of
> 5.14 dev cycle and the crash prevents testing (unlike the fuzzer
> warning).

I might be missing something, so any thoughts would be appreciated. But 
I don't think the assertion in btrfs_close_one_device is correct.

 From what I see, this crash happens when close_ctree is called while a 
dev_replace hasn't completed. In close_ctree, we suspend the 
dev_replace, but keep the replace target around so that we can resume 
the dev_replace procedure when we mount the root again. This is the call 


However, since the replace target sticks around, there is a device with 
BTRFS_DEV_STATE_REPLACE_TGT set, and we fail the assertion in 

Two options I can think of:

- We could remove the assertion.

- Or we could clear the BTRFS_DEV_STATE_REPLACE_TGT bit in 
btrfs_dev_replace_suspend_for_unmount. This is fine since the bit is set 
again in btrfs_init_dev_replace if the dev_replace->replace_state is 
BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED. But this approach strikes me as 
a little odd because the device is still the replace target when 
mounting in the future.


  reply	other threads:[~2021-08-12 17:31 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-27  7:13 Desmond Cheong Zhi Xi
2021-07-28 12:58 ` David Sterba
2021-08-12 10:38 ` David Sterba
2021-08-12 15:43   ` Desmond Cheong Zhi Xi
2021-08-12 15:50     ` David Sterba
2021-08-12 17:31       ` Desmond Cheong Zhi Xi [this message]
2021-08-13  8:51         ` David Sterba
2021-08-13  9:57           ` Desmond Cheong Zhi Xi
2021-08-13 10:30             ` David Sterba
2021-08-19 17:11               ` Desmond Cheong Zhi Xi
2021-08-19 17:34                 ` David Sterba
2021-08-20  3:09                   ` Desmond Cheong Zhi Xi
2021-08-20 10:58                     ` David Sterba
2021-08-20 17:53                       ` Desmond Cheong Zhi Xi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \
    --subject='Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).