LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] md: fix two problems with setting the "re-add" device state.
@ 2018-04-26  4:46 NeilBrown
  2018-04-29  9:44 ` Goldwyn Rodrigues
  0 siblings, 1 reply; 2+ messages in thread
From: NeilBrown @ 2018-04-26  4:46 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Goldwyn Rodrigues, Linux RAID Mailing List, LKML

[-- Attachment #1: Type: text/plain, Size: 2113 bytes --]


If "re-add" is written to the "state" file for a device
which is faulty, this has an effect similar to removing
and re-adding the device.  It should take up the
same slot in the array that it previously had, and
an accelerated (e.g. bitmap-based) rebuild should happen.

The slot that "it previously had" is determined by
rdev->saved_raid_disk.
However this is not set when a device fails (only when a device
is added), and it is cleared when resync completes.
This means that "re-add" will normally work once, but may not work a
second time.

This patch includes two fixes.
1/ when a device fails, record the ->raid_disk value in
    ->saved_raid_disk before clearing ->raid_disk
2/ when "re-add" is written to a device for which
    ->saved_raid_disk is not set, fail.

I think this is suitable for stable as it can
cause re-adding a device to be forced to do a full
resync which takes a lot longer and so puts data at
more risk.

Cc: <stable@vger.kernel.org> (v4.1)
Fixes: 97f6cd39da22 ("md-cluster: re-add capabilities")
Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/md.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3bea45e8ccff..ecd4235c6e30 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2853,7 +2853,8 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
 			err = 0;
 		}
 	} else if (cmd_match(buf, "re-add")) {
-		if (test_bit(Faulty, &rdev->flags) && (rdev->raid_disk == -1)) {
+		if (test_bit(Faulty, &rdev->flags) && (rdev->raid_disk == -1) &&
+			rdev->saved_raid_disk >= 0) {
 			/* clear_bit is performed _after_ all the devices
 			 * have their local Faulty bit cleared. If any writes
 			 * happen in the meantime in the local node, they
@@ -8641,6 +8642,7 @@ static int remove_and_add_spares(struct mddev *mddev,
 			if (mddev->pers->hot_remove_disk(
 				    mddev, rdev) == 0) {
 				sysfs_unlink_rdev(mddev, rdev);
+				rdev->saved_raid_disk = rdev->raid_disk;
 				rdev->raid_disk = -1;
 				removed++;
 			}
-- 
2.14.0.rc0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] md: fix two problems with setting the "re-add" device state.
  2018-04-26  4:46 [PATCH] md: fix two problems with setting the "re-add" device state NeilBrown
@ 2018-04-29  9:44 ` Goldwyn Rodrigues
  0 siblings, 0 replies; 2+ messages in thread
From: Goldwyn Rodrigues @ 2018-04-29  9:44 UTC (permalink / raw)
  To: NeilBrown, Shaohua Li; +Cc: Linux RAID Mailing List, LKML



On 04/25/2018 11:46 PM, NeilBrown wrote:
> 
> If "re-add" is written to the "state" file for a device
> which is faulty, this has an effect similar to removing
> and re-adding the device.  It should take up the
> same slot in the array that it previously had, and
> an accelerated (e.g. bitmap-based) rebuild should happen.
> 
> The slot that "it previously had" is determined by
> rdev->saved_raid_disk.
> However this is not set when a device fails (only when a device
> is added), and it is cleared when resync completes.
> This means that "re-add" will normally work once, but may not work a
> second time.
> 
> This patch includes two fixes.
> 1/ when a device fails, record the ->raid_disk value in
>     ->saved_raid_disk before clearing ->raid_disk
> 2/ when "re-add" is written to a device for which
>     ->saved_raid_disk is not set, fail.
> 
> I think this is suitable for stable as it can
> cause re-adding a device to be forced to do a full
> resync which takes a lot longer and so puts data at
> more risk.
> 
> Cc: <stable@vger.kernel.org> (v4.1)
> Fixes: 97f6cd39da22 ("md-cluster: re-add capabilities")
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  drivers/md/md.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 3bea45e8ccff..ecd4235c6e30 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -2853,7 +2853,8 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
>  			err = 0;
>  		}
>  	} else if (cmd_match(buf, "re-add")) {
> -		if (test_bit(Faulty, &rdev->flags) && (rdev->raid_disk == -1)) {
> +		if (test_bit(Faulty, &rdev->flags) && (rdev->raid_disk == -1) &&
> +			rdev->saved_raid_disk >= 0) {
>  			/* clear_bit is performed _after_ all the devices
>  			 * have their local Faulty bit cleared. If any writes
>  			 * happen in the meantime in the local node, they
> @@ -8641,6 +8642,7 @@ static int remove_and_add_spares(struct mddev *mddev,
>  			if (mddev->pers->hot_remove_disk(
>  				    mddev, rdev) == 0) {
>  				sysfs_unlink_rdev(mddev, rdev);
> +				rdev->saved_raid_disk = rdev->raid_disk;
>  				rdev->raid_disk = -1;
>  				removed++;
>  			}
> 

Performing a partial resync as opposed to full resync is always better
and less time consuming. Thanks!

Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-04-29  9:44 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-26  4:46 [PATCH] md: fix two problems with setting the "re-add" device state NeilBrown
2018-04-29  9:44 ` Goldwyn Rodrigues

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).