LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH rdma-next] RDMA/mlx5: Avoid taking MRs from larger MR cache pools when a pool is empty
@ 2021-09-26  8:31 Leon Romanovsky
  2021-10-04 23:00 ` Jason Gunthorpe
  0 siblings, 1 reply; 3+ messages in thread
From: Leon Romanovsky @ 2021-09-26  8:31 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Aharon Landau, linux-kernel, linux-rdma

From: Aharon Landau <aharonl@nvidia.com>

Currently, if a cache entry is empty, the driver will try to take MRs
from larger cache entries. This behavior consumes a lot of memory.
In addition, when searching for an mkey in an entry, the entry is locked.
When using a multithreaded application with the old behavior, the threads
will block each other more often, which can hurt performance as can be
seen in the table below.

Therefore, avoid it by creating a new mkey when the requested cache entry
is empty.

The test was performed on a machine with
Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 44 cores.

Here are the time measures for allocating MRs of 2^6 pages. The search in
the cache started from entry 6.

+------------+---------------------+---------------------+
|            |     Old behavior    |     New behavior    |
|            +----------+----------+----------+----------+
|            | 1 thread | 5 thread | 1 thread | 5 thread |
+============+==========+==========+==========+==========+
|  1,000 MRs |   14 ms  |   30 ms  |   14 ms  |   80 ms  |
+------------+----------+----------+----------+----------+
| 10,000 MRs |  135 ms  |   6 sec  |  173 ms  |  880 ms  |
+------------+----------+----------+----------+----------+
|100,000 MRs | 11.2 sec |  57 sec  | 1.74 sec |  8.8 sec |
+------------+----------+----------+----------+----------+

Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/mr.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 3be36ebbf67a..b4d2322e9ca5 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -600,29 +600,21 @@ struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev,
 /* Return a MR already available in the cache */
 static struct mlx5_ib_mr *get_cache_mr(struct mlx5_cache_ent *req_ent)
 {
-	struct mlx5_ib_dev *dev = req_ent->dev;
 	struct mlx5_ib_mr *mr = NULL;
 	struct mlx5_cache_ent *ent = req_ent;
 
-	/* Try larger MR pools from the cache to satisfy the allocation */
-	for (; ent != &dev->cache.ent[MR_CACHE_LAST_STD_ENTRY + 1]; ent++) {
-		mlx5_ib_dbg(dev, "order %u, cache index %zu\n", ent->order,
-			    ent - dev->cache.ent);
-
-		spin_lock_irq(&ent->lock);
-		if (!list_empty(&ent->head)) {
-			mr = list_first_entry(&ent->head, struct mlx5_ib_mr,
-					      list);
-			list_del(&mr->list);
-			ent->available_mrs--;
-			queue_adjust_cache_locked(ent);
-			spin_unlock_irq(&ent->lock);
-			mlx5_clear_mr(mr);
-			return mr;
-		}
+	spin_lock_irq(&ent->lock);
+	if (!list_empty(&ent->head)) {
+		mr = list_first_entry(&ent->head, struct mlx5_ib_mr, list);
+		list_del(&mr->list);
+		ent->available_mrs--;
 		queue_adjust_cache_locked(ent);
 		spin_unlock_irq(&ent->lock);
+		mlx5_clear_mr(mr);
+		return mr;
 	}
+	queue_adjust_cache_locked(ent);
+	spin_unlock_irq(&ent->lock);
 	req_ent->miss++;
 	return NULL;
 }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH rdma-next] RDMA/mlx5: Avoid taking MRs from larger MR cache pools when a pool is empty
  2021-09-26  8:31 [PATCH rdma-next] RDMA/mlx5: Avoid taking MRs from larger MR cache pools when a pool is empty Leon Romanovsky
@ 2021-10-04 23:00 ` Jason Gunthorpe
  2021-10-06  9:30   ` Leon Romanovsky
  0 siblings, 1 reply; 3+ messages in thread
From: Jason Gunthorpe @ 2021-10-04 23:00 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, Aharon Landau, linux-kernel, linux-rdma

On Sun, Sep 26, 2021 at 11:31:43AM +0300, Leon Romanovsky wrote:
> From: Aharon Landau <aharonl@nvidia.com>
> 
> Currently, if a cache entry is empty, the driver will try to take MRs
> from larger cache entries. This behavior consumes a lot of memory.
> In addition, when searching for an mkey in an entry, the entry is locked.
> When using a multithreaded application with the old behavior, the threads
> will block each other more often, which can hurt performance as can be
> seen in the table below.
> 
> Therefore, avoid it by creating a new mkey when the requested cache entry
> is empty.
> 
> The test was performed on a machine with
> Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 44 cores.
> 
> Here are the time measures for allocating MRs of 2^6 pages. The search in
> the cache started from entry 6.
> 
> +------------+---------------------+---------------------+
> |            |     Old behavior    |     New behavior    |
> |            +----------+----------+----------+----------+
> |            | 1 thread | 5 thread | 1 thread | 5 thread |
> +============+==========+==========+==========+==========+
> |  1,000 MRs |   14 ms  |   30 ms  |   14 ms  |   80 ms  |
> +------------+----------+----------+----------+----------+
> | 10,000 MRs |  135 ms  |   6 sec  |  173 ms  |  880 ms  |
> +------------+----------+----------+----------+----------+
> |100,000 MRs | 11.2 sec |  57 sec  | 1.74 sec |  8.8 sec |
> +------------+----------+----------+----------+----------+
> 
> Signed-off-by: Aharon Landau <aharonl@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/infiniband/hw/mlx5/mr.c | 26 +++++++++-----------------
>  1 file changed, 9 insertions(+), 17 deletions(-)

I'm surprised the cost is so high, I assume this has alot to do with
repeated calls to queue_adjust_cache_locked()? Maybe this should be
further investigated?

Anyhow, applied to for-next, thanks

Jason

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH rdma-next] RDMA/mlx5: Avoid taking MRs from larger MR cache pools when a pool is empty
  2021-10-04 23:00 ` Jason Gunthorpe
@ 2021-10-06  9:30   ` Leon Romanovsky
  0 siblings, 0 replies; 3+ messages in thread
From: Leon Romanovsky @ 2021-10-06  9:30 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, Aharon Landau, linux-kernel, linux-rdma

On Mon, Oct 04, 2021 at 08:00:03PM -0300, Jason Gunthorpe wrote:
> On Sun, Sep 26, 2021 at 11:31:43AM +0300, Leon Romanovsky wrote:
> > From: Aharon Landau <aharonl@nvidia.com>
> > 
> > Currently, if a cache entry is empty, the driver will try to take MRs
> > from larger cache entries. This behavior consumes a lot of memory.
> > In addition, when searching for an mkey in an entry, the entry is locked.
> > When using a multithreaded application with the old behavior, the threads
> > will block each other more often, which can hurt performance as can be
> > seen in the table below.
> > 
> > Therefore, avoid it by creating a new mkey when the requested cache entry
> > is empty.
> > 
> > The test was performed on a machine with
> > Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 44 cores.
> > 
> > Here are the time measures for allocating MRs of 2^6 pages. The search in
> > the cache started from entry 6.
> > 
> > +------------+---------------------+---------------------+
> > |            |     Old behavior    |     New behavior    |
> > |            +----------+----------+----------+----------+
> > |            | 1 thread | 5 thread | 1 thread | 5 thread |
> > +============+==========+==========+==========+==========+
> > |  1,000 MRs |   14 ms  |   30 ms  |   14 ms  |   80 ms  |
> > +------------+----------+----------+----------+----------+
> > | 10,000 MRs |  135 ms  |   6 sec  |  173 ms  |  880 ms  |
> > +------------+----------+----------+----------+----------+
> > |100,000 MRs | 11.2 sec |  57 sec  | 1.74 sec |  8.8 sec |
> > +------------+----------+----------+----------+----------+
> > 
> > Signed-off-by: Aharon Landau <aharonl@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >  drivers/infiniband/hw/mlx5/mr.c | 26 +++++++++-----------------
> >  1 file changed, 9 insertions(+), 17 deletions(-)
> 
> I'm surprised the cost is so high, I assume this has alot to do with
> repeated calls to queue_adjust_cache_locked()? Maybe this should be
> further investigated?

I don't think so, most of the overhead comes from entry lock, which
effectively stops any change to that shared entry.

> 
> Anyhow, applied to for-next, thanks

Thanks

> 
> Jason

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-10-06  9:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-26  8:31 [PATCH rdma-next] RDMA/mlx5: Avoid taking MRs from larger MR cache pools when a pool is empty Leon Romanovsky
2021-10-04 23:00 ` Jason Gunthorpe
2021-10-06  9:30   ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).