Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtio-fs@redhat.com
Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com,
	dgilbert@redhat.com, Dan Williams <dan.j.williams@intel.com>,
	linux-nvdimm@lists.01.org
Subject: [PATCH v2 02/20] dax: Create a range version of dax_layout_busy_page()
Date: Fri,  7 Aug 2020 15:55:08 -0400	[thread overview]
Message-ID: <20200807195526.426056-3-vgoyal@redhat.com> (raw)
In-Reply-To: <20200807195526.426056-1-vgoyal@redhat.com>

virtiofs device has a range of memory which is mapped into file inodes
using dax. This memory is mapped in qemu on host and maps different
sections of real file on host. Size of this memory is limited
(determined by administrator) and depending on filesystem size, we will
soon reach a situation where all the memory is in use and we need to
reclaim some.

As part of reclaim process, we will need to make sure that there are
no active references to pages (taken by get_user_pages()) on the memory
range we are trying to reclaim. I am planning to use
dax_layout_busy_page() for this. But in current form this is per inode
and scans through all the pages of the inode.

We want to reclaim only a portion of memory (say 2MB page). So we want
to make sure that only that 2MB range of pages do not have any
references  (and don't want to unmap all the pages of inode).

Hence, create a range version of this function named
dax_layout_busy_page_range() which can be used to pass a range which
needs to be unmapped.

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-nvdimm@lists.01.org
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/dax.c            | 66 ++++++++++++++++++++++++++++++++-------------
 include/linux/dax.h |  6 +++++
 2 files changed, 54 insertions(+), 18 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 11b16729b86f..0d51b0fbb489 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -558,27 +558,20 @@ static void *grab_mapping_entry(struct xa_state *xas,
 	return xa_mk_internal(VM_FAULT_FALLBACK);
 }
 
-/**
- * dax_layout_busy_page - find first pinned page in @mapping
- * @mapping: address space to scan for a page with ref count > 1
- *
- * DAX requires ZONE_DEVICE mapped pages. These pages are never
- * 'onlined' to the page allocator so they are considered idle when
- * page->count == 1. A filesystem uses this interface to determine if
- * any page in the mapping is busy, i.e. for DMA, or other
- * get_user_pages() usages.
- *
- * It is expected that the filesystem is holding locks to block the
- * establishment of new mappings in this address_space. I.e. it expects
- * to be able to run unmap_mapping_range() and subsequently not race
- * mapping_mapped() becoming true.
+/*
+ * Partial pages are included. If end is LLONG_MAX, pages in the range from
+ * start to end of the file are inluded.
  */
-struct page *dax_layout_busy_page(struct address_space *mapping)
+struct page *dax_layout_busy_page_range(struct address_space *mapping,
+					loff_t start, loff_t end)
 {
-	XA_STATE(xas, &mapping->i_pages, 0);
 	void *entry;
 	unsigned int scanned = 0;
 	struct page *page = NULL;
+	pgoff_t start_idx = start >> PAGE_SHIFT;
+	pgoff_t end_idx = end >> PAGE_SHIFT;
+	XA_STATE(xas, &mapping->i_pages, start_idx);
+	loff_t len, lstart = round_down(start, PAGE_SIZE);
 
 	/*
 	 * In the 'limited' case get_user_pages() for dax is disabled.
@@ -589,6 +582,22 @@ struct page *dax_layout_busy_page(struct address_space *mapping)
 	if (!dax_mapping(mapping) || !mapping_mapped(mapping))
 		return NULL;
 
+	/* If end == LLONG_MAX, all pages from start to till end of file */
+	if (end == LLONG_MAX) {
+		end_idx = ULONG_MAX;
+		len = 0;
+	} else {
+		/* length is being calculated from lstart and not start.
+		 * This is due to behavior of unmap_mapping_range(). If
+		 * start is say 4094 and end is on 4096 then we want to
+		 * unamp two pages, idx 0 and 1. But unmap_mapping_range()
+		 * will unmap only page at idx 0. If we calculate len
+		 * from the rounded down start, this problem should not
+		 * happen.
+		 */
+		len = end - lstart + 1;
+	}
+
 	/*
 	 * If we race get_user_pages_fast() here either we'll see the
 	 * elevated page count in the iteration and wait, or
@@ -601,10 +610,10 @@ struct page *dax_layout_busy_page(struct address_space *mapping)
 	 * guaranteed to either see new references or prevent new
 	 * references from being established.
 	 */
-	unmap_mapping_range(mapping, 0, 0, 0);
+	unmap_mapping_range(mapping, start, len, 0);
 
 	xas_lock_irq(&xas);
-	xas_for_each(&xas, entry, ULONG_MAX) {
+	xas_for_each(&xas, entry, end_idx) {
 		if (WARN_ON_ONCE(!xa_is_value(entry)))
 			continue;
 		if (unlikely(dax_is_locked(entry)))
@@ -625,6 +634,27 @@ struct page *dax_layout_busy_page(struct address_space *mapping)
 	xas_unlock_irq(&xas);
 	return page;
 }
+EXPORT_SYMBOL_GPL(dax_layout_busy_page_range);
+
+/**
+ * dax_layout_busy_page - find first pinned page in @mapping
+ * @mapping: address space to scan for a page with ref count > 1
+ *
+ * DAX requires ZONE_DEVICE mapped pages. These pages are never
+ * 'onlined' to the page allocator so they are considered idle when
+ * page->count == 1. A filesystem uses this interface to determine if
+ * any page in the mapping is busy, i.e. for DMA, or other
+ * get_user_pages() usages.
+ *
+ * It is expected that the filesystem is holding locks to block the
+ * establishment of new mappings in this address_space. I.e. it expects
+ * to be able to run unmap_mapping_range() and subsequently not race
+ * mapping_mapped() becoming true.
+ */
+struct page *dax_layout_busy_page(struct address_space *mapping)
+{
+	return dax_layout_busy_page_range(mapping, 0, 0);
+}
 EXPORT_SYMBOL_GPL(dax_layout_busy_page);
 
 static int __dax_invalidate_entry(struct address_space *mapping,
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 6904d4e0b2e0..9016929db4c6 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -141,6 +141,7 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 		struct dax_device *dax_dev, struct writeback_control *wbc);
 
 struct page *dax_layout_busy_page(struct address_space *mapping);
+struct page *dax_layout_busy_page_range(struct address_space *mapping, loff_t start, loff_t end);
 dax_entry_t dax_lock_page(struct page *page);
 void dax_unlock_page(struct page *page, dax_entry_t cookie);
 #else
@@ -171,6 +172,11 @@ static inline struct page *dax_layout_busy_page(struct address_space *mapping)
 	return NULL;
 }
 
+static inline struct page *dax_layout_busy_page_range(struct address_space *mapping, pgoff_t start, pgoff_t nr_pages)
+{
+	return NULL;
+}
+
 static inline int dax_writeback_mapping_range(struct address_space *mapping,
 		struct dax_device *dax_dev, struct writeback_control *wbc)
 {
-- 
2.25.4


  parent reply	other threads:[~2020-08-07 19:57 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-07 19:55 [PATCH v2 00/20] virtiofs: Add DAX support Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 01/20] dax: Modify bdev_dax_pgoff() to handle NULL bdev Vivek Goyal
2020-08-17 16:57   ` Jan Kara
2020-08-07 19:55 ` Vivek Goyal [this message]
2020-08-17 16:53   ` [PATCH v2 02/20] dax: Create a range version of dax_layout_busy_page() Jan Kara
2020-08-17 17:22     ` Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 03/20] virtio: Add get_shm_region method Vivek Goyal
2020-08-10 13:47   ` Michael S. Tsirkin
2020-08-10 13:54     ` Vivek Goyal
2020-08-10 14:02   ` Michael S. Tsirkin
2020-08-10 14:06   ` Michael S. Tsirkin
2020-08-07 19:55 ` [PATCH v2 04/20] virtio: Implement get_shm_region for PCI transport Vivek Goyal
2020-08-10 14:05   ` Michael S. Tsirkin
2020-08-10 14:50     ` Vivek Goyal
     [not found]       ` <CAAfnVBk+Hmcm2ftd3wOK-P2NyYQ7z4Wrf1JKhLJaNkCZBLoo6g@mail.gmail.com>
2020-08-17 20:29         ` Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 05/20] virtio: Implement get_shm_region for MMIO transport Vivek Goyal
2020-08-10 14:03   ` Michael S. Tsirkin
2020-08-07 19:55 ` [PATCH v2 06/20] virtiofs: Provide a helper function for virtqueue initialization Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 07/20] fuse: Get rid of no_mount_options Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 08/20] fuse,virtiofs: Add a mount option to enable dax Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 09/20] virtio_fs, dax: Set up virtio_fs dax_device Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 10/20] fuse,virtiofs: Keep a list of free dax memory ranges Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 11/20] fuse: implement FUSE_INIT map_alignment field Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 12/20] fuse: Introduce setupmapping/removemapping commands Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 13/20] fuse, dax: Implement dax read/write operations Vivek Goyal
2020-08-10 22:06   ` Dave Chinner
2020-08-11 17:44     ` Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 14/20] fuse,dax: add DAX mmap support Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 15/20] fuse, dax: Take ->i_mmap_sem lock during dax page fault Vivek Goyal
2020-08-10 22:22   ` Dave Chinner
2020-08-11 17:55     ` Vivek Goyal
2020-08-12  1:23       ` Dave Chinner
2020-08-12 21:10         ` Vivek Goyal
2020-08-13  5:12           ` Dave Chinner
2020-08-07 19:55 ` [PATCH v2 16/20] fuse,virtiofs: Define dax address space operations Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 17/20] fuse,virtiofs: Maintain a list of busy elements Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 18/20] fuse: Release file in process context Vivek Goyal
2020-08-10  8:29   ` Miklos Szeredi
2020-08-10 15:48     ` Vivek Goyal
2020-08-10 19:37     ` Vivek Goyal
2020-08-10 19:39       ` Miklos Szeredi
2020-08-07 19:55 ` [PATCH v2 19/20] fuse: Take inode lock for dax inode truncation Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 20/20] fuse,virtiofs: Add logic to free up a memory range Vivek Goyal
2020-08-10  7:29 ` [PATCH v2 00/20] virtiofs: Add DAX support Miklos Szeredi
2020-08-10 13:08   ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200807195526.426056-3-vgoyal@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=dgilbert@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=miklos@szeredi.hu \
    --cc=stefanha@redhat.com \
    --cc=virtio-fs@redhat.com \
    --subject='Re: [PATCH v2 02/20] dax: Create a range version of dax_layout_busy_page()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).