LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
@ 2018-05-03 14:04 Lidong Chen
2018-05-03 15:33 ` Jason Gunthorpe
0 siblings, 1 reply; 16+ messages in thread
From: Lidong Chen @ 2018-05-03 14:04 UTC (permalink / raw)
To: dledford, jgg, akpm, qing.huang, leon, artemyko, dan.j.williams
Cc: linux-rdma, linux-kernel, adido, galsha, aviadye, Lidong Chen
The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
exited, get_pid_task will return NULL, ib_umem_release does not decrease
mm->pinned_vm. This patch fixes it by use tgid.
Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
drivers/infiniband/core/umem.c | 12 ++++++------
include/rdma/ib_umem.h | 2 +-
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 9a4e899..8813ba5 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -119,7 +119,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
umem->length = size;
umem->address = addr;
umem->page_shift = PAGE_SHIFT;
- umem->pid = get_task_pid(current, PIDTYPE_PID);
+ umem->tgid = get_task_pid(current->group_leader, PIDTYPE_PID);
/*
* We ask for writable memory if any of the following
* access flags are set. "Local write" and "remote write"
@@ -132,7 +132,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
if (access & IB_ACCESS_ON_DEMAND) {
- put_pid(umem->pid);
+ put_pid(umem->tgid);
ret = ib_umem_odp_get(context, umem, access);
if (ret) {
kfree(umem);
@@ -148,7 +148,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
page_list = (struct page **) __get_free_page(GFP_KERNEL);
if (!page_list) {
- put_pid(umem->pid);
+ put_pid(umem->tgid);
kfree(umem);
return ERR_PTR(-ENOMEM);
}
@@ -231,7 +231,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
if (ret < 0) {
if (need_release)
__ib_umem_release(context->device, umem, 0);
- put_pid(umem->pid);
+ put_pid(umem->tgid);
kfree(umem);
} else
current->mm->pinned_vm = locked;
@@ -274,8 +274,8 @@ void ib_umem_release(struct ib_umem *umem)
__ib_umem_release(umem->context->device, umem, 1);
- task = get_pid_task(umem->pid, PIDTYPE_PID);
- put_pid(umem->pid);
+ task = get_pid_task(umem->tgid, PIDTYPE_PID);
+ put_pid(umem->tgid);
if (!task)
goto out;
mm = get_task_mm(task);
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 23159dd..2398849 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -48,7 +48,7 @@ struct ib_umem {
int writable;
int hugetlb;
struct work_struct work;
- struct pid *pid;
+ struct pid *tgid;
struct mm_struct *mm;
unsigned long diff;
struct ib_umem_odp *odp_data;
--
1.8.3.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
2018-05-03 14:04 [PATCH] IB/umem: use tgid instead of pid in ib_umem structure Lidong Chen
@ 2018-05-03 15:33 ` Jason Gunthorpe
2018-05-03 18:12 ` Leon Romanovsky
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2018-05-03 15:33 UTC (permalink / raw)
To: Lidong Chen
Cc: dledford, akpm, qing.huang, leon, artemyko, dan.j.williams,
linux-rdma, linux-kernel, adido, galsha, aviadye, Lidong Chen
On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
> exited, get_pid_task will return NULL, ib_umem_release does not decrease
> mm->pinned_vm. This patch fixes it by use tgid.
>
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> ---
> drivers/infiniband/core/umem.c | 12 ++++++------
> include/rdma/ib_umem.h | 2 +-
> 2 files changed, 7 insertions(+), 7 deletions(-)
Why are we even using a struct pid for this? Does anyone know?
I'm surprised that struct task isn't held in the struct ib_umem..
Is group_leader really OK and always guarenteed to return the same
struct mm?? For some reason I have this recollection that the leader
can change under some situation..
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index 9a4e899..8813ba5 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -119,7 +119,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> umem->length = size;
> umem->address = addr;
> umem->page_shift = PAGE_SHIFT;
> - umem->pid = get_task_pid(current, PIDTYPE_PID);
> + umem->tgid = get_task_pid(current->group_leader, PIDTYPE_PID);
> /*
> * We ask for writable memory if any of the following
> * access flags are set. "Local write" and "remote write"
> @@ -132,7 +132,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
>
> if (access & IB_ACCESS_ON_DEMAND) {
> - put_pid(umem->pid);
> + put_pid(umem->tgid);
> ret = ib_umem_odp_get(context, umem, access);
> if (ret) {
> kfree(umem);
> @@ -148,7 +148,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>
> page_list = (struct page **) __get_free_page(GFP_KERNEL);
> if (!page_list) {
> - put_pid(umem->pid);
> + put_pid(umem->tgid);
> kfree(umem);
> return ERR_PTR(-ENOMEM);
> }
> @@ -231,7 +231,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> if (ret < 0) {
> if (need_release)
> __ib_umem_release(context->device, umem, 0);
> - put_pid(umem->pid);
> + put_pid(umem->tgid);
> kfree(umem);
> } else
> current->mm->pinned_vm = locked;
> @@ -274,8 +274,8 @@ void ib_umem_release(struct ib_umem *umem)
>
> __ib_umem_release(umem->context->device, umem, 1);
>
> - task = get_pid_task(umem->pid, PIDTYPE_PID);
> - put_pid(umem->pid);
> + task = get_pid_task(umem->tgid, PIDTYPE_PID);
> + put_pid(umem->tgid);
> if (!task)
> goto out;
> mm = get_task_mm(task);
> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
> index 23159dd..2398849 100644
> --- a/include/rdma/ib_umem.h
> +++ b/include/rdma/ib_umem.h
> @@ -48,7 +48,7 @@ struct ib_umem {
> int writable;
> int hugetlb;
> struct work_struct work;
> - struct pid *pid;
> + struct pid *tgid;
> struct mm_struct *mm;
> unsigned long diff;
> struct ib_umem_odp *odp_data;
> --
> 1.8.3.1
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
2018-05-03 15:33 ` Jason Gunthorpe
@ 2018-05-03 18:12 ` Leon Romanovsky
2018-05-03 18:26 ` Jason Gunthorpe
2018-05-04 2:41 ` 858585 jemmy
2018-05-04 3:14 ` 858585 jemmy
2 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2018-05-03 18:12 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Lidong Chen, dledford, akpm, qing.huang, artemyko,
dan.j.williams, linux-rdma, linux-kernel, adido, galsha, aviadye,
Lidong Chen
[-- Attachment #1: Type: text/plain, Size: 919 bytes --]
On Thu, May 03, 2018 at 09:33:10AM -0600, Jason Gunthorpe wrote:
> On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
> > The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
> > If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
> > exited, get_pid_task will return NULL, ib_umem_release does not decrease
> > mm->pinned_vm. This patch fixes it by use tgid.
> >
> > Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> > ---
> > drivers/infiniband/core/umem.c | 12 ++++++------
> > include/rdma/ib_umem.h | 2 +-
> > 2 files changed, 7 insertions(+), 7 deletions(-)
>
> Why are we even using a struct pid for this? Does anyone know?
>
Can it be related to "fork" support?
> I'm surprised that struct task isn't held in the struct ib_umem..
>
I think that this code can be removed and all accesses to mm_struct can
be done with "current->mm".
Thanks
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
2018-05-03 18:12 ` Leon Romanovsky
@ 2018-05-03 18:26 ` Jason Gunthorpe
2018-05-03 18:43 ` Leon Romanovsky
0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2018-05-03 18:26 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Lidong Chen, dledford, akpm, qing.huang, artemyko,
dan.j.williams, linux-rdma, linux-kernel, adido, galsha, aviadye,
Lidong Chen
On Thu, May 03, 2018 at 09:12:35PM +0300, Leon Romanovsky wrote:
> On Thu, May 03, 2018 at 09:33:10AM -0600, Jason Gunthorpe wrote:
> > On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
> > > The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
> > > If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
> > > exited, get_pid_task will return NULL, ib_umem_release does not decrease
> > > mm->pinned_vm. This patch fixes it by use tgid.
> > >
> > > Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> > > drivers/infiniband/core/umem.c | 12 ++++++------
> > > include/rdma/ib_umem.h | 2 +-
> > > 2 files changed, 7 insertions(+), 7 deletions(-)
> >
> > Why are we even using a struct pid for this? Does anyone know?
> >
>
> Can it be related to "fork" support?
Not sure..
Ideally we want to hold the struct mm, but we can't hold it long
term, so pid is a surrogate for that.
> > I'm surprised that struct task isn't held in the struct ib_umem..
> >
>
> I think that this code can be removed and all accesses to mm_struct can
> be done with "current->mm".
That sounds wrong for fork support, as the mm used in destroy MUST
exactly match the mm used in create..
How does this accounting work in fork anyhow?
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
2018-05-03 18:26 ` Jason Gunthorpe
@ 2018-05-03 18:43 ` Leon Romanovsky
2018-05-03 22:01 ` Jason Gunthorpe
0 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2018-05-03 18:43 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Lidong Chen, dledford, akpm, qing.huang, artemyko,
dan.j.williams, linux-rdma, linux-kernel, adido, galsha, aviadye,
Lidong Chen
[-- Attachment #1: Type: text/plain, Size: 1491 bytes --]
On Thu, May 03, 2018 at 12:26:56PM -0600, Jason Gunthorpe wrote:
> On Thu, May 03, 2018 at 09:12:35PM +0300, Leon Romanovsky wrote:
> > On Thu, May 03, 2018 at 09:33:10AM -0600, Jason Gunthorpe wrote:
> > > On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
> > > > The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
> > > > If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
> > > > exited, get_pid_task will return NULL, ib_umem_release does not decrease
> > > > mm->pinned_vm. This patch fixes it by use tgid.
> > > >
> > > > Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> > > > drivers/infiniband/core/umem.c | 12 ++++++------
> > > > include/rdma/ib_umem.h | 2 +-
> > > > 2 files changed, 7 insertions(+), 7 deletions(-)
> > >
> > > Why are we even using a struct pid for this? Does anyone know?
> > >
> >
> > Can it be related to "fork" support?
>
> Not sure..
>
> Ideally we want to hold the struct mm, but we can't hold it long
> term, so pid is a surrogate for that.
>
> > > I'm surprised that struct task isn't held in the struct ib_umem..
> > >
> >
> > I think that this code can be removed and all accesses to mm_struct can
> > be done with "current->mm".
>
> That sounds wrong for fork support, as the mm used in destroy MUST
> exactly match the mm used in create..
>
> How does this accounting work in fork anyhow?
We are not supporting fork, so this is why I proposed to remove it.
Thanks
>
> Jason
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
2018-05-03 18:43 ` Leon Romanovsky
@ 2018-05-03 22:01 ` Jason Gunthorpe
2018-05-04 8:32 ` 858585 jemmy
0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2018-05-03 22:01 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Lidong Chen, dledford, akpm, qing.huang, artemyko,
dan.j.williams, linux-rdma, linux-kernel, adido, galsha, aviadye,
Lidong Chen
On Thu, May 03, 2018 at 09:43:01PM +0300, Leon Romanovsky wrote:
> On Thu, May 03, 2018 at 12:26:56PM -0600, Jason Gunthorpe wrote:
> > On Thu, May 03, 2018 at 09:12:35PM +0300, Leon Romanovsky wrote:
> > > On Thu, May 03, 2018 at 09:33:10AM -0600, Jason Gunthorpe wrote:
> > > > On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
> > > > > The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
> > > > > If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
> > > > > exited, get_pid_task will return NULL, ib_umem_release does not decrease
> > > > > mm->pinned_vm. This patch fixes it by use tgid.
> > > > >
> > > > > Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> > > > > drivers/infiniband/core/umem.c | 12 ++++++------
> > > > > include/rdma/ib_umem.h | 2 +-
> > > > > 2 files changed, 7 insertions(+), 7 deletions(-)
> > > >
> > > > Why are we even using a struct pid for this? Does anyone know?
> > > >
> > >
> > > Can it be related to "fork" support?
> >
> > Not sure..
> >
> > Ideally we want to hold the struct mm, but we can't hold it long
> > term, so pid is a surrogate for that.
> >
> > > > I'm surprised that struct task isn't held in the struct ib_umem..
> > > >
> > >
> > > I think that this code can be removed and all accesses to mm_struct can
> > > be done with "current->mm".
> >
> > That sounds wrong for fork support, as the mm used in destroy MUST
> > exactly match the mm used in create..
> >
> > How does this accounting work in fork anyhow?
>
> We are not supporting fork, so this is why I proposed to remove it.
Er, the new kabi certainly can call reg and dereg across a fork
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
2018-05-03 15:33 ` Jason Gunthorpe
2018-05-03 18:12 ` Leon Romanovsky
@ 2018-05-04 2:41 ` 858585 jemmy
2018-05-04 3:14 ` 858585 jemmy
2 siblings, 0 replies; 16+ messages in thread
From: 858585 jemmy @ 2018-05-04 2:41 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: dledford, akpm, qing.huang, leon, artemyko, dan.j.williams,
linux-rdma, linux-kernel, adido, Gal Shachaf, Aviad Yehezkel,
Lidong Chen
On Thu, May 3, 2018 at 11:33 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
>> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
>> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
>> exited, get_pid_task will return NULL, ib_umem_release does not decrease
>> mm->pinned_vm. This patch fixes it by use tgid.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> ---
>> drivers/infiniband/core/umem.c | 12 ++++++------
>> include/rdma/ib_umem.h | 2 +-
>> 2 files changed, 7 insertions(+), 7 deletions(-)
>
> Why are we even using a struct pid for this? Does anyone know?
>
> I'm surprised that struct task isn't held in the struct ib_umem..
>
> Is group_leader really OK and always guarenteed to return the same
> struct mm?? For some reason I have this recollection that the leader
> can change under some situation..
I read kernel code, it seems that the userspace thread can guarantee
to return the same struct mm.
in the copy_process function, it checks the CLONE_THREAD,
CLONE_SIGHAND, CLONE_VM must use
at the same time.
And I do not find any other part code change the mm and group_leader
for userspace thread.
>
>> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>> index 9a4e899..8813ba5 100644
>> --- a/drivers/infiniband/core/umem.c
>> +++ b/drivers/infiniband/core/umem.c
>> @@ -119,7 +119,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>> umem->length = size;
>> umem->address = addr;
>> umem->page_shift = PAGE_SHIFT;
>> - umem->pid = get_task_pid(current, PIDTYPE_PID);
>> + umem->tgid = get_task_pid(current->group_leader, PIDTYPE_PID);
>> /*
>> * We ask for writable memory if any of the following
>> * access flags are set. "Local write" and "remote write"
>> @@ -132,7 +132,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>> IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
>>
>> if (access & IB_ACCESS_ON_DEMAND) {
>> - put_pid(umem->pid);
>> + put_pid(umem->tgid);
>> ret = ib_umem_odp_get(context, umem, access);
>> if (ret) {
>> kfree(umem);
>> @@ -148,7 +148,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>
>> page_list = (struct page **) __get_free_page(GFP_KERNEL);
>> if (!page_list) {
>> - put_pid(umem->pid);
>> + put_pid(umem->tgid);
>> kfree(umem);
>> return ERR_PTR(-ENOMEM);
>> }
>> @@ -231,7 +231,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>> if (ret < 0) {
>> if (need_release)
>> __ib_umem_release(context->device, umem, 0);
>> - put_pid(umem->pid);
>> + put_pid(umem->tgid);
>> kfree(umem);
>> } else
>> current->mm->pinned_vm = locked;
>> @@ -274,8 +274,8 @@ void ib_umem_release(struct ib_umem *umem)
>>
>> __ib_umem_release(umem->context->device, umem, 1);
>>
>> - task = get_pid_task(umem->pid, PIDTYPE_PID);
>> - put_pid(umem->pid);
>> + task = get_pid_task(umem->tgid, PIDTYPE_PID);
>> + put_pid(umem->tgid);
>> if (!task)
>> goto out;
>> mm = get_task_mm(task);
>> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
>> index 23159dd..2398849 100644
>> --- a/include/rdma/ib_umem.h
>> +++ b/include/rdma/ib_umem.h
>> @@ -48,7 +48,7 @@ struct ib_umem {
>> int writable;
>> int hugetlb;
>> struct work_struct work;
>> - struct pid *pid;
>> + struct pid *tgid;
>> struct mm_struct *mm;
>> unsigned long diff;
>> struct ib_umem_odp *odp_data;
>> --
>> 1.8.3.1
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
2018-05-03 15:33 ` Jason Gunthorpe
2018-05-03 18:12 ` Leon Romanovsky
2018-05-04 2:41 ` 858585 jemmy
@ 2018-05-04 3:14 ` 858585 jemmy
2018-05-04 8:51 ` 858585 jemmy
2 siblings, 1 reply; 16+ messages in thread
From: 858585 jemmy @ 2018-05-04 3:14 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: dledford, akpm, qing.huang, leon, artemyko, dan.j.williams,
linux-rdma, linux-kernel, adido, Gal Shachaf, Aviad Yehezkel,
Lidong Chen
On Thu, May 3, 2018 at 11:33 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
>> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
>> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
>> exited, get_pid_task will return NULL, ib_umem_release does not decrease
>> mm->pinned_vm. This patch fixes it by use tgid.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> ---
>> drivers/infiniband/core/umem.c | 12 ++++++------
>> include/rdma/ib_umem.h | 2 +-
>> 2 files changed, 7 insertions(+), 7 deletions(-)
>
> Why are we even using a struct pid for this? Does anyone know?
commit 87773dd56d5405ac28119fcfadacefd35877c18f add pid in ib_umem structure.
and the comment has such information:
Later a different process with a different mm_struct than the one that
allocated the ib_umem struct
ends up releasing it which results in decrementing the new processes
mm->pinned_vm count past
zero and wrapping.
>
> I'm surprised that struct task isn't held in the struct ib_umem..
>
> Is group_leader really OK and always guarenteed to return the same
> struct mm?? For some reason I have this recollection that the leader
> can change under some situation..
>
>> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>> index 9a4e899..8813ba5 100644
>> --- a/drivers/infiniband/core/umem.c
>> +++ b/drivers/infiniband/core/umem.c
>> @@ -119,7 +119,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>> umem->length = size;
>> umem->address = addr;
>> umem->page_shift = PAGE_SHIFT;
>> - umem->pid = get_task_pid(current, PIDTYPE_PID);
>> + umem->tgid = get_task_pid(current->group_leader, PIDTYPE_PID);
>> /*
>> * We ask for writable memory if any of the following
>> * access flags are set. "Local write" and "remote write"
>> @@ -132,7 +132,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>> IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
>>
>> if (access & IB_ACCESS_ON_DEMAND) {
>> - put_pid(umem->pid);
>> + put_pid(umem->tgid);
>> ret = ib_umem_odp_get(context, umem, access);
>> if (ret) {
>> kfree(umem);
>> @@ -148,7 +148,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>
>> page_list = (struct page **) __get_free_page(GFP_KERNEL);
>> if (!page_list) {
>> - put_pid(umem->pid);
>> + put_pid(umem->tgid);
>> kfree(umem);
>> return ERR_PTR(-ENOMEM);
>> }
>> @@ -231,7 +231,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>> if (ret < 0) {
>> if (need_release)
>> __ib_umem_release(context->device, umem, 0);
>> - put_pid(umem->pid);
>> + put_pid(umem->tgid);
>> kfree(umem);
>> } else
>> current->mm->pinned_vm = locked;
>> @@ -274,8 +274,8 @@ void ib_umem_release(struct ib_umem *umem)
>>
>> __ib_umem_release(umem->context->device, umem, 1);
>>
>> - task = get_pid_task(umem->pid, PIDTYPE_PID);
>> - put_pid(umem->pid);
>> + task = get_pid_task(umem->tgid, PIDTYPE_PID);
>> + put_pid(umem->tgid);
>> if (!task)
>> goto out;
>> mm = get_task_mm(task);
>> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
>> index 23159dd..2398849 100644
>> --- a/include/rdma/ib_umem.h
>> +++ b/include/rdma/ib_umem.h
>> @@ -48,7 +48,7 @@ struct ib_umem {
>> int writable;
>> int hugetlb;
>> struct work_struct work;
>> - struct pid *pid;
>> + struct pid *tgid;
>> struct mm_struct *mm;
>> unsigned long diff;
>> struct ib_umem_odp *odp_data;
>> --
>> 1.8.3.1
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
2018-05-03 22:01 ` Jason Gunthorpe
@ 2018-05-04 8:32 ` 858585 jemmy
2018-05-04 13:39 ` Leon Romanovsky
0 siblings, 1 reply; 16+ messages in thread
From: 858585 jemmy @ 2018-05-04 8:32 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, dledford, akpm, qing.huang, artemyko,
dan.j.williams, linux-rdma, linux-kernel, adido, Gal Shachaf,
Aviad Yehezkel, Lidong Chen
On Fri, May 4, 2018 at 6:01 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Thu, May 03, 2018 at 09:43:01PM +0300, Leon Romanovsky wrote:
>> On Thu, May 03, 2018 at 12:26:56PM -0600, Jason Gunthorpe wrote:
>> > On Thu, May 03, 2018 at 09:12:35PM +0300, Leon Romanovsky wrote:
>> > > On Thu, May 03, 2018 at 09:33:10AM -0600, Jason Gunthorpe wrote:
>> > > > On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
>> > > > > The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
>> > > > > If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
>> > > > > exited, get_pid_task will return NULL, ib_umem_release does not decrease
>> > > > > mm->pinned_vm. This patch fixes it by use tgid.
>> > > > >
>> > > > > Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> > > > > drivers/infiniband/core/umem.c | 12 ++++++------
>> > > > > include/rdma/ib_umem.h | 2 +-
>> > > > > 2 files changed, 7 insertions(+), 7 deletions(-)
>> > > >
>> > > > Why are we even using a struct pid for this? Does anyone know?
>> > > >
>> > >
>> > > Can it be related to "fork" support?
>> >
>> > Not sure..
>> >
>> > Ideally we want to hold the struct mm, but we can't hold it long
>> > term, so pid is a surrogate for that.
>> >
>> > > > I'm surprised that struct task isn't held in the struct ib_umem..
>> > > >
>> > >
>> > > I think that this code can be removed and all accesses to mm_struct can
>> > > be done with "current->mm".
>> >
>> > That sounds wrong for fork support, as the mm used in destroy MUST
>> > exactly match the mm used in create..
>> >
>> > How does this accounting work in fork anyhow?
>>
>> We are not supporting fork, so this is why I proposed to remove it.
>
> Er, the new kabi certainly can call reg and dereg across a fork
what is the expect behavior after fork?
I write a test code, the dereg just return EACCES in the child
process. and have no effect.
>
> Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
2018-05-04 3:14 ` 858585 jemmy
@ 2018-05-04 8:51 ` 858585 jemmy
2018-05-04 18:23 ` Jason Gunthorpe
0 siblings, 1 reply; 16+ messages in thread
From: 858585 jemmy @ 2018-05-04 8:51 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: dledford, akpm, qing.huang, Leon Romanovsky, artemyko,
dan.j.williams, linux-rdma, linux-kernel, adido, Gal Shachaf,
Aviad Yehezkel, Lidong Chen
On Fri, May 4, 2018 at 11:14 AM, 858585 jemmy <jemmy858585@gmail.com> wrote:
> On Thu, May 3, 2018 at 11:33 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>> On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
>>> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
>>> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
>>> exited, get_pid_task will return NULL, ib_umem_release does not decrease
>>> mm->pinned_vm. This patch fixes it by use tgid.
>>>
>>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>>> ---
>>> drivers/infiniband/core/umem.c | 12 ++++++------
>>> include/rdma/ib_umem.h | 2 +-
>>> 2 files changed, 7 insertions(+), 7 deletions(-)
>>
>> Why are we even using a struct pid for this? Does anyone know?
>
> commit 87773dd56d5405ac28119fcfadacefd35877c18f add pid in ib_umem structure.
>
> and the comment has such information:
> Later a different process with a different mm_struct than the one that
> allocated the ib_umem struct
> ends up releasing it which results in decrementing the new processes
> mm->pinned_vm count past
> zero and wrapping.
>
I think a different process should not have the permission to release ib_umem.
so maybe the reason is not a different process?
can ib_umem_release be invoked in interrupt context?
>>
>> I'm surprised that struct task isn't held in the struct ib_umem..
>>
>> Is group_leader really OK and always guarenteed to return the same
>> struct mm?? For some reason I have this recollection that the leader
>> can change under some situation..
>>
>>> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>>> index 9a4e899..8813ba5 100644
>>> --- a/drivers/infiniband/core/umem.c
>>> +++ b/drivers/infiniband/core/umem.c
>>> @@ -119,7 +119,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>> umem->length = size;
>>> umem->address = addr;
>>> umem->page_shift = PAGE_SHIFT;
>>> - umem->pid = get_task_pid(current, PIDTYPE_PID);
>>> + umem->tgid = get_task_pid(current->group_leader, PIDTYPE_PID);
>>> /*
>>> * We ask for writable memory if any of the following
>>> * access flags are set. "Local write" and "remote write"
>>> @@ -132,7 +132,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>> IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
>>>
>>> if (access & IB_ACCESS_ON_DEMAND) {
>>> - put_pid(umem->pid);
>>> + put_pid(umem->tgid);
>>> ret = ib_umem_odp_get(context, umem, access);
>>> if (ret) {
>>> kfree(umem);
>>> @@ -148,7 +148,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>>
>>> page_list = (struct page **) __get_free_page(GFP_KERNEL);
>>> if (!page_list) {
>>> - put_pid(umem->pid);
>>> + put_pid(umem->tgid);
>>> kfree(umem);
>>> return ERR_PTR(-ENOMEM);
>>> }
>>> @@ -231,7 +231,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>> if (ret < 0) {
>>> if (need_release)
>>> __ib_umem_release(context->device, umem, 0);
>>> - put_pid(umem->pid);
>>> + put_pid(umem->tgid);
>>> kfree(umem);
>>> } else
>>> current->mm->pinned_vm = locked;
>>> @@ -274,8 +274,8 @@ void ib_umem_release(struct ib_umem *umem)
>>>
>>> __ib_umem_release(umem->context->device, umem, 1);
>>>
>>> - task = get_pid_task(umem->pid, PIDTYPE_PID);
>>> - put_pid(umem->pid);
>>> + task = get_pid_task(umem->tgid, PIDTYPE_PID);
>>> + put_pid(umem->tgid);
>>> if (!task)
>>> goto out;
>>> mm = get_task_mm(task);
>>> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
>>> index 23159dd..2398849 100644
>>> --- a/include/rdma/ib_umem.h
>>> +++ b/include/rdma/ib_umem.h
>>> @@ -48,7 +48,7 @@ struct ib_umem {
>>> int writable;
>>> int hugetlb;
>>> struct work_struct work;
>>> - struct pid *pid;
>>> + struct pid *tgid;
>>> struct mm_struct *mm;
>>> unsigned long diff;
>>> struct ib_umem_odp *odp_data;
>>> --
>>> 1.8.3.1
>>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
2018-05-04 8:32 ` 858585 jemmy
@ 2018-05-04 13:39 ` Leon Romanovsky
2018-05-04 15:14 ` lidongchen(陈立东)
0 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2018-05-04 13:39 UTC (permalink / raw)
To: 858585 jemmy
Cc: Jason Gunthorpe, dledford, akpm, qing.huang, artemyko,
dan.j.williams, linux-rdma, linux-kernel, adido, Gal Shachaf,
Aviad Yehezkel, Lidong Chen
[-- Attachment #1: Type: text/plain, Size: 2314 bytes --]
On Fri, May 04, 2018 at 04:32:38PM +0800, 858585 jemmy wrote:
> On Fri, May 4, 2018 at 6:01 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Thu, May 03, 2018 at 09:43:01PM +0300, Leon Romanovsky wrote:
> >> On Thu, May 03, 2018 at 12:26:56PM -0600, Jason Gunthorpe wrote:
> >> > On Thu, May 03, 2018 at 09:12:35PM +0300, Leon Romanovsky wrote:
> >> > > On Thu, May 03, 2018 at 09:33:10AM -0600, Jason Gunthorpe wrote:
> >> > > > On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
> >> > > > > The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
> >> > > > > If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
> >> > > > > exited, get_pid_task will return NULL, ib_umem_release does not decrease
> >> > > > > mm->pinned_vm. This patch fixes it by use tgid.
> >> > > > >
> >> > > > > Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> >> > > > > drivers/infiniband/core/umem.c | 12 ++++++------
> >> > > > > include/rdma/ib_umem.h | 2 +-
> >> > > > > 2 files changed, 7 insertions(+), 7 deletions(-)
> >> > > >
> >> > > > Why are we even using a struct pid for this? Does anyone know?
> >> > > >
> >> > >
> >> > > Can it be related to "fork" support?
> >> >
> >> > Not sure..
> >> >
> >> > Ideally we want to hold the struct mm, but we can't hold it long
> >> > term, so pid is a surrogate for that.
> >> >
> >> > > > I'm surprised that struct task isn't held in the struct ib_umem..
> >> > > >
> >> > >
> >> > > I think that this code can be removed and all accesses to mm_struct can
> >> > > be done with "current->mm".
> >> >
> >> > That sounds wrong for fork support, as the mm used in destroy MUST
> >> > exactly match the mm used in create..
> >> >
> >> > How does this accounting work in fork anyhow?
> >>
> >> We are not supporting fork, so this is why I proposed to remove it.
> >
> > Er, the new kabi certainly can call reg and dereg across a fork
>
> what is the expect behavior after fork?
> I write a test code, the dereg just return EACCES in the child
> process. and have no effect.
Did you do reg/dereg over write() interface? If yes, this is expected
behaviour of "not-supported fork()". A couple of months/years ago, your
test program would work, but we closed this option due to security
constraints.
Thanks
>
> >
> > Jason
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
2018-05-04 13:39 ` Leon Romanovsky
@ 2018-05-04 15:14 ` lidongchen(陈立东)
0 siblings, 0 replies; 16+ messages in thread
From: lidongchen(陈立东) @ 2018-05-04 15:14 UTC (permalink / raw)
To: Leon Romanovsky
Cc: 858585 jemmy, Jason Gunthorpe, dledford, akpm, qing.huang,
artemyko, dan.j.williams, linux-rdma, linux-kernel, adido,
Gal Shachaf, Aviad Yehezkel
> 在 2018年5月4日,21:39,Leon Romanovsky <leon@kernel.org> 写道:
>
>> On Fri, May 04, 2018 at 04:32:38PM +0800, 858585 jemmy wrote:
>>> On Fri, May 4, 2018 at 6:01 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>>> On Thu, May 03, 2018 at 09:43:01PM +0300, Leon Romanovsky wrote:
>>>>> On Thu, May 03, 2018 at 12:26:56PM -0600, Jason Gunthorpe wrote:
>>>>>> On Thu, May 03, 2018 at 09:12:35PM +0300, Leon Romanovsky wrote:
>>>>>>> On Thu, May 03, 2018 at 09:33:10AM -0600, Jason Gunthorpe wrote:
>>>>>>>> On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
>>>>>>>> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
>>>>>>>> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
>>>>>>>> exited, get_pid_task will return NULL, ib_umem_release does not decrease
>>>>>>>> mm->pinned_vm. This patch fixes it by use tgid.
>>>>>>>>
>>>>>>>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>>>>>>>> drivers/infiniband/core/umem.c | 12 ++++++------
>>>>>>>> include/rdma/ib_umem.h | 2 +-
>>>>>>>> 2 files changed, 7 insertions(+), 7 deletions(-)
>>>>>>>
>>>>>>> Why are we even using a struct pid for this? Does anyone know?
>>>>>>>
>>>>>>
>>>>>> Can it be related to "fork" support?
>>>>>
>>>>> Not sure..
>>>>>
>>>>> Ideally we want to hold the struct mm, but we can't hold it long
>>>>> term, so pid is a surrogate for that.
>>>>>
>>>>>>> I'm surprised that struct task isn't held in the struct ib_umem..
>>>>>>>
>>>>>>
>>>>>> I think that this code can be removed and all accesses to mm_struct can
>>>>>> be done with "current->mm".
>>>>>
>>>>> That sounds wrong for fork support, as the mm used in destroy MUST
>>>>> exactly match the mm used in create..
>>>>>
>>>>> How does this accounting work in fork anyhow?
>>>>
>>>> We are not supporting fork, so this is why I proposed to remove it.
>>>
>>> Er, the new kabi certainly can call reg and dereg across a fork
>>
>> what is the expect behavior after fork?
>> I write a test code, the dereg just return EACCES in the child
>> process. and have no effect.
>
> Did you do reg/dereg over write() interface? If yes, this is expected
> behaviour of "not-supported fork()". A couple of months/years ago, your
> test program would work, but we closed this option due to security
> constraints.
the parent process call ibv_reg_mr, and the child process call ibv_dereg_mr.
If fork is not supported now, so use tgid to get mm structure is fine for multithread.
>
> Thanks
>
>>
>>>
>>> Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
2018-05-04 8:51 ` 858585 jemmy
@ 2018-05-04 18:23 ` Jason Gunthorpe
2018-05-07 1:38 ` 858585 jemmy
0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2018-05-04 18:23 UTC (permalink / raw)
To: 858585 jemmy
Cc: dledford, akpm, qing.huang, Leon Romanovsky, artemyko,
dan.j.williams, linux-rdma, linux-kernel, adido, Gal Shachaf,
Aviad Yehezkel, Lidong Chen
On Fri, May 04, 2018 at 04:51:15PM +0800, 858585 jemmy wrote:
> On Fri, May 4, 2018 at 11:14 AM, 858585 jemmy <jemmy858585@gmail.com> wrote:
> > On Thu, May 3, 2018 at 11:33 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >> On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
> >>> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
> >>> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
> >>> exited, get_pid_task will return NULL, ib_umem_release does not decrease
> >>> mm->pinned_vm. This patch fixes it by use tgid.
> >>>
> >>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> >>> drivers/infiniband/core/umem.c | 12 ++++++------
> >>> include/rdma/ib_umem.h | 2 +-
> >>> 2 files changed, 7 insertions(+), 7 deletions(-)
> >>
> >> Why are we even using a struct pid for this? Does anyone know?
> >
> > commit 87773dd56d5405ac28119fcfadacefd35877c18f add pid in ib_umem structure.
> >
> > and the comment has such information:
> > Later a different process with a different mm_struct than the one that
> > allocated the ib_umem struct
> > ends up releasing it which results in decrementing the new processes
> > mm->pinned_vm count past
> > zero and wrapping.
>
> I think a different process should not have the permission to release ib_umem.
> so maybe the reason is not a different process?
> can ib_umem_release be invoked in interrupt context?
We plan to restore fork support and add some way to share MRs between
processes, so we must consider having a different process release the
umem than acquired it.
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
2018-05-04 18:23 ` Jason Gunthorpe
@ 2018-05-07 1:38 ` 858585 jemmy
2018-05-08 6:30 ` Jason Gunthorpe
0 siblings, 1 reply; 16+ messages in thread
From: 858585 jemmy @ 2018-05-07 1:38 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: dledford, akpm, qing.huang, Leon Romanovsky, artemyko,
dan.j.williams, linux-rdma, linux-kernel, adido, Gal Shachaf,
Aviad Yehezkel, Lidong Chen
On Sat, May 5, 2018 at 2:23 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Fri, May 04, 2018 at 04:51:15PM +0800, 858585 jemmy wrote:
>> On Fri, May 4, 2018 at 11:14 AM, 858585 jemmy <jemmy858585@gmail.com> wrote:
>> > On Thu, May 3, 2018 at 11:33 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>> >> On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
>> >>> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
>> >>> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
>> >>> exited, get_pid_task will return NULL, ib_umem_release does not decrease
>> >>> mm->pinned_vm. This patch fixes it by use tgid.
>> >>>
>> >>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> >>> drivers/infiniband/core/umem.c | 12 ++++++------
>> >>> include/rdma/ib_umem.h | 2 +-
>> >>> 2 files changed, 7 insertions(+), 7 deletions(-)
>> >>
>> >> Why are we even using a struct pid for this? Does anyone know?
>> >
>> > commit 87773dd56d5405ac28119fcfadacefd35877c18f add pid in ib_umem structure.
>> >
>> > and the comment has such information:
>> > Later a different process with a different mm_struct than the one that
>> > allocated the ib_umem struct
>> > ends up releasing it which results in decrementing the new processes
>> > mm->pinned_vm count past
>> > zero and wrapping.
>>
>> I think a different process should not have the permission to release ib_umem.
>> so maybe the reason is not a different process?
>> can ib_umem_release be invoked in interrupt context?
>
> We plan to restore fork support and add some way to share MRs between
> processes, so we must consider having a different process release the
> umem than acquired it.
If restore fork support, what is the expected behavior?
If parent process pinned_vm is x, what is the child process pinned_vm
value after fork? It reset to zero now.
If the parent process call ibv_dereg_mr after fork, should the child
process decrease pinned_vm?
If the child process call ibv_dereg_mr after fork, should the parent
process decrease pinned_vm?
>
> Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
2018-05-07 1:38 ` 858585 jemmy
@ 2018-05-08 6:30 ` Jason Gunthorpe
2018-05-08 8:32 ` 858585 jemmy
0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2018-05-08 6:30 UTC (permalink / raw)
To: 858585 jemmy
Cc: dledford, akpm, qing.huang, Leon Romanovsky, artemyko,
dan.j.williams, linux-rdma, linux-kernel, adido, Gal Shachaf,
Aviad Yehezkel, Lidong Chen
On Mon, May 07, 2018 at 09:38:53AM +0800, 858585 jemmy wrote:
> On Sat, May 5, 2018 at 2:23 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Fri, May 04, 2018 at 04:51:15PM +0800, 858585 jemmy wrote:
> >> On Fri, May 4, 2018 at 11:14 AM, 858585 jemmy <jemmy858585@gmail.com> wrote:
> >> > On Thu, May 3, 2018 at 11:33 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >> >> On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
> >> >>> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
> >> >>> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
> >> >>> exited, get_pid_task will return NULL, ib_umem_release does not decrease
> >> >>> mm->pinned_vm. This patch fixes it by use tgid.
> >> >>>
> >> >>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> >> >>> drivers/infiniband/core/umem.c | 12 ++++++------
> >> >>> include/rdma/ib_umem.h | 2 +-
> >> >>> 2 files changed, 7 insertions(+), 7 deletions(-)
> >> >>
> >> >> Why are we even using a struct pid for this? Does anyone know?
> >> >
> >> > commit 87773dd56d5405ac28119fcfadacefd35877c18f add pid in ib_umem structure.
> >> >
> >> > and the comment has such information:
> >> > Later a different process with a different mm_struct than the one that
> >> > allocated the ib_umem struct
> >> > ends up releasing it which results in decrementing the new processes
> >> > mm->pinned_vm count past
> >> > zero and wrapping.
> >>
> >> I think a different process should not have the permission to release ib_umem.
> >> so maybe the reason is not a different process?
> >> can ib_umem_release be invoked in interrupt context?
> >
> > We plan to restore fork support and add some way to share MRs between
> > processes, so we must consider having a different process release the
> > umem than acquired it.
>
> If restore fork support, what is the expected behavior?
> If parent process pinned_vm is x, what is the child process pinned_vm
> value after fork? It reset to zero now.
> If the parent process call ibv_dereg_mr after fork, should the child
> process decrease pinned_vm?
> If the child process call ibv_dereg_mr after fork, should the parent
> process decrease pinned_vm?
If I recall the purpose of accessing the MM during de-register is to
undo the pinned pages change (pinned_vm) that register performed.
So, the semantic is simple, during deregister we must access excatly
the same MM that was used during register and undo the change to
pinned_vm.
The approach should be to find the most reliably way to hold a
reference to the MM that was used during register.
Apparently we can't just hold a ref on the mm (according to mm_get's
comment at least)
tgid is clearly a better indirect reference to the mm than pid (pid is
so obviously wrong)
But I am wondering why not just hold struct task here instead of tgid?
Isn't task->mm going to be more reliably than tgid->task->mm ??
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
2018-05-08 6:30 ` Jason Gunthorpe
@ 2018-05-08 8:32 ` 858585 jemmy
0 siblings, 0 replies; 16+ messages in thread
From: 858585 jemmy @ 2018-05-08 8:32 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: dledford, akpm, qing.huang, Leon Romanovsky, artemyko,
dan.j.williams, linux-rdma, linux-kernel, adido, Gal Shachaf,
Aviad Yehezkel, Lidong Chen
On Tue, May 8, 2018 at 2:30 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Mon, May 07, 2018 at 09:38:53AM +0800, 858585 jemmy wrote:
>> On Sat, May 5, 2018 at 2:23 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>> > On Fri, May 04, 2018 at 04:51:15PM +0800, 858585 jemmy wrote:
>> >> On Fri, May 4, 2018 at 11:14 AM, 858585 jemmy <jemmy858585@gmail.com> wrote:
>> >> > On Thu, May 3, 2018 at 11:33 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>> >> >> On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
>> >> >>> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
>> >> >>> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
>> >> >>> exited, get_pid_task will return NULL, ib_umem_release does not decrease
>> >> >>> mm->pinned_vm. This patch fixes it by use tgid.
>> >> >>>
>> >> >>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> >> >>> drivers/infiniband/core/umem.c | 12 ++++++------
>> >> >>> include/rdma/ib_umem.h | 2 +-
>> >> >>> 2 files changed, 7 insertions(+), 7 deletions(-)
>> >> >>
>> >> >> Why are we even using a struct pid for this? Does anyone know?
>> >> >
>> >> > commit 87773dd56d5405ac28119fcfadacefd35877c18f add pid in ib_umem structure.
>> >> >
>> >> > and the comment has such information:
>> >> > Later a different process with a different mm_struct than the one that
>> >> > allocated the ib_umem struct
>> >> > ends up releasing it which results in decrementing the new processes
>> >> > mm->pinned_vm count past
>> >> > zero and wrapping.
>> >>
>> >> I think a different process should not have the permission to release ib_umem.
>> >> so maybe the reason is not a different process?
>> >> can ib_umem_release be invoked in interrupt context?
>> >
>> > We plan to restore fork support and add some way to share MRs between
>> > processes, so we must consider having a different process release the
>> > umem than acquired it.
>>
>> If restore fork support, what is the expected behavior?
>> If parent process pinned_vm is x, what is the child process pinned_vm
>> value after fork? It reset to zero now.
>> If the parent process call ibv_dereg_mr after fork, should the child
>> process decrease pinned_vm?
>> If the child process call ibv_dereg_mr after fork, should the parent
>> process decrease pinned_vm?
>
> If I recall the purpose of accessing the MM during de-register is to
> undo the pinned pages change (pinned_vm) that register performed.
>
> So, the semantic is simple, during deregister we must access excatly
> the same MM that was used during register and undo the change to
> pinned_vm.
>
> The approach should be to find the most reliably way to hold a
> reference to the MM that was used during register.
>
> Apparently we can't just hold a ref on the mm (according to mm_get's
> comment at least)
>
> tgid is clearly a better indirect reference to the mm than pid (pid is
> so obviously wrong)
>
> But I am wondering why not just hold struct task here instead of tgid?
> Isn't task->mm going to be more reliably than tgid->task->mm ??
I think get_task_struct(current->group_leader) is also work.
But I find ib_ucontext structure already have a tgid field, so I think this not
necessary to ib_umem have tgid again. we can use ib_ucontext->tgid.
I will send a v2 patch.
>
> Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2018-05-08 8:32 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-03 14:04 [PATCH] IB/umem: use tgid instead of pid in ib_umem structure Lidong Chen
2018-05-03 15:33 ` Jason Gunthorpe
2018-05-03 18:12 ` Leon Romanovsky
2018-05-03 18:26 ` Jason Gunthorpe
2018-05-03 18:43 ` Leon Romanovsky
2018-05-03 22:01 ` Jason Gunthorpe
2018-05-04 8:32 ` 858585 jemmy
2018-05-04 13:39 ` Leon Romanovsky
2018-05-04 15:14 ` lidongchen(陈立东)
2018-05-04 2:41 ` 858585 jemmy
2018-05-04 3:14 ` 858585 jemmy
2018-05-04 8:51 ` 858585 jemmy
2018-05-04 18:23 ` Jason Gunthorpe
2018-05-07 1:38 ` 858585 jemmy
2018-05-08 6:30 ` Jason Gunthorpe
2018-05-08 8:32 ` 858585 jemmy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).