LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 1/2] mm: export find_extend_vma and handle_mm_fault for driver use
@ 2014-11-12 22:10 Jesse Barnes
  2014-11-12 22:10 ` [PATCH 2/2] iommu/amd: use handle_mm_fault directly v2 Jesse Barnes
  0 siblings, 1 reply; 4+ messages in thread
From: Jesse Barnes @ 2014-11-12 22:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: jroedel, akpm

This lets drivers like the AMD IOMMUv2 driver handle faults a bit more
simply, rather than doing tricks with page refs and get_user_pages().

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 mm/memory.c | 1 +
 mm/mmap.c   | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index 1cc6bfb..969ff0c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3378,6 +3378,7 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(handle_mm_fault);
 
 #ifndef __PAGETABLE_PUD_FOLDED
 /*
diff --git a/mm/mmap.c b/mm/mmap.c
index 7f85520..2ee7971 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2359,6 +2359,8 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr)
 }
 #endif
 
+EXPORT_SYMBOL_GPL(find_extend_vma);
+
 /*
  * Ok - we have the memory areas we should free on the vma list,
  * so release them, and do the vma updates.
-- 
1.9.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/2] iommu/amd: use handle_mm_fault directly v2
  2014-11-12 22:10 [PATCH 1/2] mm: export find_extend_vma and handle_mm_fault for driver use Jesse Barnes
@ 2014-11-12 22:10 ` Jesse Barnes
  2015-01-25 13:16   ` Oded Gabbay
  0 siblings, 1 reply; 4+ messages in thread
From: Jesse Barnes @ 2014-11-12 22:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: jroedel, akpm

This could be useful for debug in the future if we want to track
major/minor faults more closely, and also avoids the put_page trick we
used with gup.

In order to do this, we also track the task struct in the PASID state
structure.  This lets us update the appropriate task stats after the
fault has been handled, and may aid with debug in the future as well.

v2: drop task accounting; GPU activity may have been submitted by a
    different thread than the one binding the PASID (Joerg)

Tested-by: Oded Gabbay <oded.gabbay@amd.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 drivers/iommu/amd_iommu_v2.c | 84 ++++++++++++++++++++++++++++----------------
 1 file changed, 53 insertions(+), 31 deletions(-)

diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index 90d734b..9c0d6e2 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -513,45 +513,67 @@ static void finish_pri_tag(struct device_state *dev_state,
 	spin_unlock_irqrestore(&pasid_state->lock, flags);
 }
 
+static void handle_fault_error(struct fault *fault)
+{
+	int status;
+
+	if (!fault->dev_state->inv_ppr_cb) {
+		set_pri_tag_status(fault->state, fault->tag, PPR_INVALID);
+		return;
+	}
+
+	status = fault->dev_state->inv_ppr_cb(fault->dev_state->pdev,
+					      fault->pasid,
+					      fault->address,
+					      fault->flags);
+	switch (status) {
+	case AMD_IOMMU_INV_PRI_RSP_SUCCESS:
+		set_pri_tag_status(fault->state, fault->tag, PPR_SUCCESS);
+		break;
+	case AMD_IOMMU_INV_PRI_RSP_INVALID:
+		set_pri_tag_status(fault->state, fault->tag, PPR_INVALID);
+		break;
+	case AMD_IOMMU_INV_PRI_RSP_FAIL:
+		set_pri_tag_status(fault->state, fault->tag, PPR_FAILURE);
+		break;
+	default:
+		BUG();
+	}
+}
+
 static void do_fault(struct work_struct *work)
 {
 	struct fault *fault = container_of(work, struct fault, work);
-	int npages, write;
-	struct page *page;
+	struct mm_struct *mm;
+	struct vm_area_struct *vma;
+	u64 address;
+	int ret, write;
 
 	write = !!(fault->flags & PPR_FAULT_WRITE);
 
-	down_read(&fault->state->mm->mmap_sem);
-	npages = get_user_pages(NULL, fault->state->mm,
-				fault->address, 1, write, 0, &page, NULL);
-	up_read(&fault->state->mm->mmap_sem);
-
-	if (npages == 1) {
-		put_page(page);
-	} else if (fault->dev_state->inv_ppr_cb) {
-		int status;
-
-		status = fault->dev_state->inv_ppr_cb(fault->dev_state->pdev,
-						      fault->pasid,
-						      fault->address,
-						      fault->flags);
-		switch (status) {
-		case AMD_IOMMU_INV_PRI_RSP_SUCCESS:
-			set_pri_tag_status(fault->state, fault->tag, PPR_SUCCESS);
-			break;
-		case AMD_IOMMU_INV_PRI_RSP_INVALID:
-			set_pri_tag_status(fault->state, fault->tag, PPR_INVALID);
-			break;
-		case AMD_IOMMU_INV_PRI_RSP_FAIL:
-			set_pri_tag_status(fault->state, fault->tag, PPR_FAILURE);
-			break;
-		default:
-			BUG();
-		}
-	} else {
-		set_pri_tag_status(fault->state, fault->tag, PPR_INVALID);
+	mm = fault->state->mm;
+	address = fault->address;
+
+	down_read(&mm->mmap_sem);
+	vma = find_extend_vma(mm, address);
+	if (!vma || address < vma->vm_start) {
+		/* failed to get a vma in the right range */
+		up_read(&mm->mmap_sem);
+		handle_fault_error(fault);
+		goto out;
+	}
+
+	ret = handle_mm_fault(mm, vma, address, write);
+	if (ret & VM_FAULT_ERROR) {
+		/* failed to service fault */
+		up_read(&mm->mmap_sem);
+		handle_fault_error(fault);
+		goto out;
 	}
 
+	up_read(&mm->mmap_sem);
+
+out:
 	finish_pri_tag(fault->dev_state, fault->state, fault->tag);
 
 	put_pasid_state(fault->state);
-- 
1.9.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] iommu/amd: use handle_mm_fault directly v2
  2014-11-12 22:10 ` [PATCH 2/2] iommu/amd: use handle_mm_fault directly v2 Jesse Barnes
@ 2015-01-25 13:16   ` Oded Gabbay
  2015-01-26 23:01     ` Jesse Barnes
  0 siblings, 1 reply; 4+ messages in thread
From: Oded Gabbay @ 2015-01-25 13:16 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: linux-kernel, jroedel, akpm, iommu, Bridgman, John, Elifaz, Dana



On 11/13/2014 12:10 AM, Jesse Barnes wrote:
> This could be useful for debug in the future if we want to track
> major/minor faults more closely, and also avoids the put_page trick we
> used with gup.
>
> In order to do this, we also track the task struct in the PASID state
> structure.  This lets us update the appropriate task stats after the
> fault has been handled, and may aid with debug in the future as well.
>
> v2: drop task accounting; GPU activity may have been submitted by a
>      different thread than the one binding the PASID (Joerg)
>
> Tested-by: Oded Gabbay<oded.gabbay@amd.com>
> Signed-off-by: Jesse Barnes<jbarnes@virtuousgeek.org>

Hi Jesse,

I know I tested your patch a few months ago, but we have a new feature (still 
internally) in the driver, which has some conflicts with this patch.

Our feature is basically doing "exception handling" by registering a callback 
function with the iommu driver in inv_ppr_cb.

Now, with the old code (we used 3.17.2 until a few days ago), this callback 
function was called in, at least, three use-cases (which we are testing):

(1) Writing to a "bad" system memory address, which is *not* in the process's 
memory address space.

(2) Writing to a read-only page, which is inside the process's memory address space

(3) Reading from a page without permissions, which is inside the process's 
memory address space

With the new code (3.19-rc5), this callback is only called in the first 
use-case, while (2) and (3) are handled in handle_mm_fault(), which is now 
called from do_fault. The return value of handle_mm_fault() is 0, so 
handle_fault_error() is not called and amdkfd doesn't get notification, hence 
our test fails.

This is a problem for us as we want to propagate these exceptions to the user 
space HSA runtime, so it could handle them.

I have 2 questions:

1. Why don't we call inv_ppr_cb() in any case ?
2. How come handle_mm_fault() returns 0 in cases (2) and (3) ? Or in other 
words, what is considered to be a success in handle_mm_fault() and is it visible 
to the user-space process ?

Thanks,

	Oded

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] iommu/amd: use handle_mm_fault directly v2
  2015-01-25 13:16   ` Oded Gabbay
@ 2015-01-26 23:01     ` Jesse Barnes
  0 siblings, 0 replies; 4+ messages in thread
From: Jesse Barnes @ 2015-01-26 23:01 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-kernel, jroedel, akpm, iommu, Bridgman, John, Elifaz, Dana

On Sun, 25 Jan 2015 15:16:44 +0200
Oded Gabbay <oded.gabbay@amd.com> wrote:

> 
> 
> On 11/13/2014 12:10 AM, Jesse Barnes wrote:
> > This could be useful for debug in the future if we want to track
> > major/minor faults more closely, and also avoids the put_page trick we
> > used with gup.
> >
> > In order to do this, we also track the task struct in the PASID state
> > structure.  This lets us update the appropriate task stats after the
> > fault has been handled, and may aid with debug in the future as well.
> >
> > v2: drop task accounting; GPU activity may have been submitted by a
> >      different thread than the one binding the PASID (Joerg)
> >
> > Tested-by: Oded Gabbay<oded.gabbay@amd.com>
> > Signed-off-by: Jesse Barnes<jbarnes@virtuousgeek.org>
> 
> Hi Jesse,
> 
> I know I tested your patch a few months ago, but we have a new feature (still 
> internally) in the driver, which has some conflicts with this patch.
> 
> Our feature is basically doing "exception handling" by registering a callback 
> function with the iommu driver in inv_ppr_cb.
> 
> Now, with the old code (we used 3.17.2 until a few days ago), this callback 
> function was called in, at least, three use-cases (which we are testing):
> 
> (1) Writing to a "bad" system memory address, which is *not* in the process's 
> memory address space.
> 
> (2) Writing to a read-only page, which is inside the process's memory address space
> 
> (3) Reading from a page without permissions, which is inside the process's 
> memory address space
> 
> With the new code (3.19-rc5), this callback is only called in the first 
> use-case, while (2) and (3) are handled in handle_mm_fault(), which is now 
> called from do_fault. The return value of handle_mm_fault() is 0, so 
> handle_fault_error() is not called and amdkfd doesn't get notification, hence 
> our test fails.
> 
> This is a problem for us as we want to propagate these exceptions to the user 
> space HSA runtime, so it could handle them.
> 
> I have 2 questions:
> 
> 1. Why don't we call inv_ppr_cb() in any case ?

We do if we fail to allocate the vma or it's in the wrong location, but
we could extend the do_fault() handling to do it in more cases.

> 2. How come handle_mm_fault() returns 0 in cases (2) and (3) ? Or in other 
> words, what is considered to be a success in handle_mm_fault() and is it visible 
> to the user-space process ?

handle_mm_fault() is somewhat of a low level function.  We can catch
more cases in our own do_fault() code if we need to.   The x86
__do_page_fault is probably a good reference.  I mainly tried to match
existing behavior when I added the handle_mm_fault(), but may have
missed stuff.  As I said, we can extend our do_fault() to handle all
the cases we want prior to calling handle_mm_fault().

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-01-26 23:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-12 22:10 [PATCH 1/2] mm: export find_extend_vma and handle_mm_fault for driver use Jesse Barnes
2014-11-12 22:10 ` [PATCH 2/2] iommu/amd: use handle_mm_fault directly v2 Jesse Barnes
2015-01-25 13:16   ` Oded Gabbay
2015-01-26 23:01     ` Jesse Barnes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).