Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
Cc: linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, dan.j.williams@intel.com,
	david@fromorbit.com, hch@lst.de, rgoldwyn@suse.de,
	qi.fuli@fujitsu.com, y-goto@fujitsu.com
Subject: Re: [RFC PATCH 2/4] pagemap: introduce ->memory_failure()
Date: Tue, 15 Sep 2020 09:31:04 -0700	[thread overview]
Message-ID: <20200915163104.GG7964@magnolia> (raw)
In-Reply-To: <20200915101311.144269-3-ruansy.fnst@cn.fujitsu.com>

On Tue, Sep 15, 2020 at 06:13:09PM +0800, Shiyang Ruan wrote:
> When memory-failure occurs, we call this function which is implemented
> by each devices.  For fsdax, pmem device implements it.  Pmem device
> will find out the block device where the error page located in, gets the
> filesystem on this block device, and finally call ->storage_lost() to
> handle the error in filesystem layer.
> 
> Normally, a pmem device may contain one or more partitions, each
> partition contains a block device, each block device contains a
> filesystem.  So we are able to find out the filesystem by one offset on
> this pmem device.  However, in other cases, such as mapped device, I
> didn't find a way to obtain the filesystem laying on it.  It is a
> problem need to be fixed.
> 
> Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
> ---
>  block/genhd.c            | 12 ++++++++++++
>  drivers/nvdimm/pmem.c    | 31 +++++++++++++++++++++++++++++++
>  include/linux/genhd.h    |  2 ++
>  include/linux/memremap.h |  3 +++
>  4 files changed, 48 insertions(+)
> 
> diff --git a/block/genhd.c b/block/genhd.c
> index 99c64641c314..e7442b60683e 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -1063,6 +1063,18 @@ struct block_device *bdget_disk(struct gendisk *disk, int partno)
>  }
>  EXPORT_SYMBOL(bdget_disk);
>  
> +struct block_device *bdget_disk_sector(struct gendisk *disk, sector_t sector)
> +{
> +	struct block_device *bdev = NULL;
> +	struct hd_struct *part = disk_map_sector_rcu(disk, sector);
> +
> +	if (part)
> +		bdev = bdget(part_devt(part));
> +
> +	return bdev;
> +}
> +EXPORT_SYMBOL(bdget_disk_sector);
> +
>  /*
>   * print a full list of all partitions - intended for places where the root
>   * filesystem can't be mounted and thus to give the victim some idea of what
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index fab29b514372..3ed96486c883 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -364,9 +364,40 @@ static void pmem_release_disk(void *__pmem)
>  	put_disk(pmem->disk);
>  }
>  
> +static int pmem_pagemap_memory_failure(struct dev_pagemap *pgmap,
> +		struct mf_recover_controller *mfrc)
> +{
> +	struct pmem_device *pdev;
> +	struct block_device *bdev;
> +	sector_t disk_sector;
> +	loff_t bdev_offset;
> +
> +	pdev = container_of(pgmap, struct pmem_device, pgmap);
> +	if (!pdev->disk)
> +		return -ENXIO;
> +
> +	disk_sector = (PFN_PHYS(mfrc->pfn) - pdev->phys_addr) >> SECTOR_SHIFT;

Ah, I see, looking at the current x86 MCE code, the MCE handler gets a
physical address, which is then rounded down to a PFN, which is then
blown back up into a byte address(?) and then rounded down to sectors.
That is then blown back up into a byte address and passed on to XFS,
which rounds it down to fs blocksize.

/me wishes that wasn't so convoluted, but reforming the whole mm poison
system to have smaller blast radii isn't the purpose of this patch. :)

> +	bdev = bdget_disk_sector(pdev->disk, disk_sector);
> +	if (!bdev)
> +		return -ENXIO;
> +
> +	// TODO what if block device contains a mapped device

Find its dev_pagemap_ops and invoke its memory_failure function? ;)

> +	if (!bdev->bd_super)
> +		goto out;
> +
> +	bdev_offset = ((disk_sector - get_start_sect(bdev)) << SECTOR_SHIFT) -
> +			pdev->data_offset;
> +	bdev->bd_super->s_op->storage_lost(bdev->bd_super, bdev_offset, mfrc);

->storage_lost is required for all filesystems?

--D

> +
> +out:
> +	bdput(bdev);
> +	return 0;
> +}
> +
>  static const struct dev_pagemap_ops fsdax_pagemap_ops = {
>  	.kill			= pmem_pagemap_kill,
>  	.cleanup		= pmem_pagemap_cleanup,
> +	.memory_failure		= pmem_pagemap_memory_failure,
>  };
>  
>  static int pmem_attach_disk(struct device *dev,
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> index 4ab853461dff..16e9e13e0841 100644
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -303,6 +303,8 @@ static inline void add_disk_no_queue_reg(struct gendisk *disk)
>  extern void del_gendisk(struct gendisk *gp);
>  extern struct gendisk *get_gendisk(dev_t dev, int *partno);
>  extern struct block_device *bdget_disk(struct gendisk *disk, int partno);
> +extern struct block_device *bdget_disk_sector(struct gendisk *disk,
> +			sector_t sector);
>  
>  extern void set_device_ro(struct block_device *bdev, int flag);
>  extern void set_disk_ro(struct gendisk *disk, int flag);
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index 5f5b2df06e61..efebefa70d00 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -6,6 +6,7 @@
>  
>  struct resource;
>  struct device;
> +struct mf_recover_controller;
>  
>  /**
>   * struct vmem_altmap - pre-allocated storage for vmemmap_populate
> @@ -87,6 +88,8 @@ struct dev_pagemap_ops {
>  	 * the page back to a CPU accessible page.
>  	 */
>  	vm_fault_t (*migrate_to_ram)(struct vm_fault *vmf);
> +	int (*memory_failure)(struct dev_pagemap *pgmap,
> +			      struct mf_recover_controller *mfrc);
>  };
>  
>  #define PGMAP_ALTMAP_VALID	(1 << 0)
> -- 
> 2.28.0
> 
> 
> 

  reply	other threads:[~2020-09-15 16:34 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-15 10:13 [RFC PATCH 0/4] fsdax: introduce fs query to support reflink Shiyang Ruan
2020-09-15 10:13 ` [RFC PATCH 1/4] fs: introduce ->storage_lost() for memory-failure Shiyang Ruan
2020-09-15 16:16   ` Darrick J. Wong
2020-09-16  2:15     ` Ruan Shiyang
2020-09-15 10:13 ` [RFC PATCH 2/4] pagemap: introduce ->memory_failure() Shiyang Ruan
2020-09-15 16:31   ` Darrick J. Wong [this message]
2020-09-16  2:15     ` Ruan Shiyang
2020-09-15 10:13 ` [RFC PATCH 3/4] mm, fsdax: refactor dax handler in memory-failure Shiyang Ruan
2020-09-15 10:13 ` [RFC PATCH 4/4] fsdax: remove useless (dis)associate functions Shiyang Ruan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200915163104.GG7964@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=qi.fuli@fujitsu.com \
    --cc=rgoldwyn@suse.de \
    --cc=ruansy.fnst@cn.fujitsu.com \
    --cc=y-goto@fujitsu.com \
    --subject='Re: [RFC PATCH 2/4] pagemap: introduce ->memory_failure()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).