* [PATCH v4 1/2] mm: reorganize internal_get_user_pages_fast()
[not found] <0-v4-908497cf359a+4782-gup_fork_jgg@nvidia.com>
@ 2020-11-10 23:44 ` Jason Gunthorpe
2020-11-11 18:27 ` [PATCH v4 0/2] Add a seqcount between gup_fast and copy_page_range() Linus Torvalds
[not found] ` <2-v4-908497cf359a+4782-gup_fork_jgg@nvidia.com>
2 siblings, 0 replies; 4+ messages in thread
From: Jason Gunthorpe @ 2020-11-10 23:44 UTC (permalink / raw)
To: linux-kernel, Peter Xu, Linus Torvalds
Cc: Ahmed S. Darwish, Andrea Arcangeli, Andrew Morton,
Christoph Hellwig, Hugh Dickins, Jan Kara, Jann Horn,
John Hubbard, Kirill Shutemov, Kirill Tkhai, Linux-MM,
Michal Hocko, Oleg Nesterov
The next patch in this series makes the lockless flow a little more
complex, so move the entire block into a new function and remove a level
of indention. Tidy a bit of cruft:
- addr is always the same as start, so use start
- Use the modern check_add_overflow() for computing end = start + len
- nr_pinned/pages << PAGE_SHIFT needs the LHS to be unsigned long to
avoid shift overflow, make the variables unsigned long to avoid coding
casts in both places. nr_pinned was missing its cast
- The handling of ret and nr_pinned can be streamlined a bit
No functional change.
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
mm/gup.c | 99 ++++++++++++++++++++++++++++++--------------------------
1 file changed, 54 insertions(+), 45 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 98eb8e6d2609c3..c7e24301860abb 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2677,13 +2677,43 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages,
return ret;
}
-static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
+static unsigned long lockless_pages_from_mm(unsigned long start,
+ unsigned long end,
+ unsigned int gup_flags,
+ struct page **pages)
+{
+ unsigned long flags;
+ int nr_pinned = 0;
+
+ if (!IS_ENABLED(CONFIG_HAVE_FAST_GUP) ||
+ !gup_fast_permitted(start, end))
+ return 0;
+
+ /*
+ * Disable interrupts. The nested form is used, in order to allow full,
+ * general purpose use of this routine.
+ *
+ * With interrupts disabled, we block page table pages from being freed
+ * from under us. See struct mmu_table_batch comments in
+ * include/asm-generic/tlb.h for more details.
+ *
+ * We do not adopt an rcu_read_lock() here as we also want to block IPIs
+ * that come from THPs splitting.
+ */
+ local_irq_save(flags);
+ gup_pgd_range(start, end, gup_flags, pages, &nr_pinned);
+ local_irq_restore(flags);
+ return nr_pinned;
+}
+
+static int internal_get_user_pages_fast(unsigned long start,
+ unsigned long nr_pages,
unsigned int gup_flags,
struct page **pages)
{
- unsigned long addr, len, end;
- unsigned long flags;
- int nr_pinned = 0, ret = 0;
+ unsigned long len, end;
+ unsigned long nr_pinned;
+ int ret;
if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
FOLL_FORCE | FOLL_PIN | FOLL_GET |
@@ -2697,54 +2727,33 @@ static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
might_lock_read(¤t->mm->mmap_lock);
start = untagged_addr(start) & PAGE_MASK;
- addr = start;
- len = (unsigned long) nr_pages << PAGE_SHIFT;
- end = start + len;
-
- if (end <= start)
+ len = nr_pages << PAGE_SHIFT;
+ if (check_add_overflow(start, len, &end))
return 0;
if (unlikely(!access_ok((void __user *)start, len)))
return -EFAULT;
- /*
- * Disable interrupts. The nested form is used, in order to allow
- * full, general purpose use of this routine.
- *
- * With interrupts disabled, we block page table pages from being
- * freed from under us. See struct mmu_table_batch comments in
- * include/asm-generic/tlb.h for more details.
- *
- * We do not adopt an rcu_read_lock(.) here as we also want to
- * block IPIs that come from THPs splitting.
- */
- if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && gup_fast_permitted(start, end)) {
- unsigned long fast_flags = gup_flags;
-
- local_irq_save(flags);
- gup_pgd_range(addr, end, fast_flags, pages, &nr_pinned);
- local_irq_restore(flags);
- ret = nr_pinned;
- }
-
- if (nr_pinned < nr_pages && !(gup_flags & FOLL_FAST_ONLY)) {
- /* Try to get the remaining pages with get_user_pages */
- start += nr_pinned << PAGE_SHIFT;
- pages += nr_pinned;
+ nr_pinned = lockless_pages_from_mm(start, end, gup_flags, pages);
+ if (nr_pinned == nr_pages || gup_flags & FOLL_FAST_ONLY)
+ return nr_pinned;
- ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned,
- gup_flags, pages);
-
- /* Have to be a bit careful with return values */
- if (nr_pinned > 0) {
- if (ret < 0)
- ret = nr_pinned;
- else
- ret += nr_pinned;
- }
+ /* Slow path: try to get the remaining pages with get_user_pages */
+ start += nr_pinned << PAGE_SHIFT;
+ pages += nr_pinned;
+ ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned, gup_flags,
+ pages);
+ if (ret < 0) {
+ /*
+ * The caller has to unpin the pages we already pinned so
+ * returning -errno is not an option
+ */
+ if (nr_pinned)
+ return nr_pinned;
+ return ret;
}
-
- return ret;
+ return ret + nr_pinned;
}
+
/**
* get_user_pages_fast_only() - pin user pages in memory
* @start: starting user address
--
2.29.2
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v4 0/2] Add a seqcount between gup_fast and copy_page_range()
[not found] <0-v4-908497cf359a+4782-gup_fork_jgg@nvidia.com>
2020-11-10 23:44 ` [PATCH v4 1/2] mm: reorganize internal_get_user_pages_fast() Jason Gunthorpe
@ 2020-11-11 18:27 ` Linus Torvalds
[not found] ` <2-v4-908497cf359a+4782-gup_fork_jgg@nvidia.com>
2 siblings, 0 replies; 4+ messages in thread
From: Linus Torvalds @ 2020-11-11 18:27 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Linux Kernel Mailing List, Peter Xu, Ahmed S. Darwish,
Andrea Arcangeli, Andrew Morton, Aneesh Kumar K.V,
Christoph Hellwig, Hugh Dickins, Jan Kara, Jann Horn,
John Hubbard, Kirill Shutemov, Kirill Tkhai, Leon Romanovsky,
Linux-MM, Michal Hocko, Oleg Nesterov
On Tue, Nov 10, 2020 at 3:44 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> As discussed and suggested by Linus use a seqcount to close the small race
> between gup_fast and copy_page_range().
>
> Ahmed confirms that raw_write_seqcount_begin() is the correct API to use
> in this case and it doesn't trigger any lockdeps.
>
> I was able to test it using two threads, one forking and the other using
> ibv_reg_mr() to trigger GUP fast. Modifying copy_page_range() to sleep
> made the window large enough to reliably hit to test the logic.
Looks all good to me.
Linus
^ permalink raw reply [flat|nested] 4+ messages in thread