LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/4] Prepare for ECMDQ support
@ 2021-08-11 11:48 Zhen Lei
  2021-08-11 11:48 ` [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance Zhen Lei
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Zhen Lei @ 2021-08-11 11:48 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy, Joerg Roedel, linux-arm-kernel, iommu,
	linux-kernel
  Cc: Zhen Lei

RFC --> v1
1. Resend the patches for ECMDQ preparation and remove the patches for ECMDQ implementation.
2. Patch 2 is modified. Other patches remain unchanged.
   1) Add static helper __arm_smmu_cmdq_issue_cmd(), and make arm_smmu_cmdq_issue_cmd()
      and arm_smmu_cmdq_issue_cmd_with_sync() implement based on it.
   2) Remove unused arm_smmu_cmdq_issue_sync().

RFC:
https://www.spinics.net/lists/arm-kernel/msg904879.html


Zhen Lei (4):
  iommu/arm-smmu-v3: Use command queue batching helpers to improve
    performance
  iommu/arm-smmu-v3: Add and use static helper function
    arm_smmu_cmdq_issue_cmd_with_sync()
  iommu/arm-smmu-v3: Add and use static helper function
    arm_smmu_get_cmdq()
  iommu/arm-smmu-v3: Extract reusable function
    __arm_smmu_cmdq_skip_err()

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 71 ++++++++++++---------
 1 file changed, 42 insertions(+), 29 deletions(-)

-- 
2.26.0.106.g9fadedd


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance
  2021-08-11 11:48 [PATCH 0/4] Prepare for ECMDQ support Zhen Lei
@ 2021-08-11 11:48 ` Zhen Lei
  2021-08-13 16:01   ` Robin Murphy
  2021-08-11 11:48 ` [PATCH 2/4] iommu/arm-smmu-v3: Add and use static helper function arm_smmu_cmdq_issue_cmd_with_sync() Zhen Lei
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Zhen Lei @ 2021-08-11 11:48 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy, Joerg Roedel, linux-arm-kernel, iommu,
	linux-kernel
  Cc: Zhen Lei

The obvious key to the performance optimization of commit 587e6c10a7ce
("iommu/arm-smmu-v3: Reduce contention during command-queue insertion") is
to allow multiple cores to insert commands in parallel after a brief mutex
contention.

Obviously, inserting as many commands at a time as possible can reduce the
number of times the mutex contention participates, thereby improving the
overall performance. At least it reduces the number of calls to function
arm_smmu_cmdq_issue_cmdlist().

Therefore, use command queue batching helpers to insert multiple commands
at a time.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 235f9bdaeaf223b..c81cd929047f573 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1747,15 +1747,16 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master *master)
 {
 	int i;
 	struct arm_smmu_cmdq_ent cmd;
+	struct arm_smmu_cmdq_batch cmds = {};
 
 	arm_smmu_atc_inv_to_cmd(0, 0, 0, &cmd);
 
 	for (i = 0; i < master->num_streams; i++) {
 		cmd.atc.sid = master->streams[i].id;
-		arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
+		arm_smmu_cmdq_batch_add(master->smmu, &cmds, &cmd);
 	}
 
-	return arm_smmu_cmdq_issue_sync(master->smmu);
+	return arm_smmu_cmdq_batch_submit(master->smmu, &cmds);
 }
 
 int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
-- 
2.26.0.106.g9fadedd


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/4] iommu/arm-smmu-v3: Add and use static helper function arm_smmu_cmdq_issue_cmd_with_sync()
  2021-08-11 11:48 [PATCH 0/4] Prepare for ECMDQ support Zhen Lei
  2021-08-11 11:48 ` [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance Zhen Lei
@ 2021-08-11 11:48 ` Zhen Lei
  2021-08-11 11:48 ` [PATCH 3/4] iommu/arm-smmu-v3: Add and use static helper function arm_smmu_get_cmdq() Zhen Lei
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Zhen Lei @ 2021-08-11 11:48 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy, Joerg Roedel, linux-arm-kernel, iommu,
	linux-kernel
  Cc: Zhen Lei

The obvious key to the performance optimization of commit 587e6c10a7ce
("iommu/arm-smmu-v3: Reduce contention during command-queue insertion") is
to allow multiple cores to insert commands in parallel after a brief mutex
contention.

Obviously, inserting as many commands at a time as possible can reduce the
number of times the mutex contention participates, thereby improving the
overall performance. At least it reduces the number of calls to function
arm_smmu_cmdq_issue_cmdlist().

Therefore, function arm_smmu_cmdq_issue_cmd_with_sync() is added to insert
the 'cmd+sync' commands at a time.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 35 +++++++++++----------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index c81cd929047f573..282f95659580267 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -845,8 +845,9 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
 	return ret;
 }
 
-static int arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
-				   struct arm_smmu_cmdq_ent *ent)
+static int __arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
+				     struct arm_smmu_cmdq_ent *ent,
+				     bool sync)
 {
 	u64 cmd[CMDQ_ENT_DWORDS];
 
@@ -856,12 +857,19 @@ static int arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
 		return -EINVAL;
 	}
 
-	return arm_smmu_cmdq_issue_cmdlist(smmu, cmd, 1, false);
+	return arm_smmu_cmdq_issue_cmdlist(smmu, cmd, 1, sync);
 }
 
-static int arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
+static int arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
+				   struct arm_smmu_cmdq_ent *ent)
 {
-	return arm_smmu_cmdq_issue_cmdlist(smmu, NULL, 0, true);
+	return __arm_smmu_cmdq_issue_cmd(smmu, ent, false);
+}
+
+static int arm_smmu_cmdq_issue_cmd_with_sync(struct arm_smmu_device *smmu,
+					     struct arm_smmu_cmdq_ent *ent)
+{
+	return __arm_smmu_cmdq_issue_cmd(smmu, ent, true);
 }
 
 static void arm_smmu_cmdq_batch_add(struct arm_smmu_device *smmu,
@@ -929,8 +937,7 @@ void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid)
 		.tlbi.asid = asid,
 	};
 
-	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
-	arm_smmu_cmdq_issue_sync(smmu);
+	arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
 }
 
 static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain,
@@ -1211,8 +1218,7 @@ static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid)
 		},
 	};
 
-	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
-	arm_smmu_cmdq_issue_sync(smmu);
+	arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
 }
 
 static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
@@ -1824,8 +1830,7 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S12_VMALL;
 		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
-		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
-		arm_smmu_cmdq_issue_sync(smmu);
+		arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
 	}
 	arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
 }
@@ -3339,18 +3344,16 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 
 	/* Invalidate any cached configuration */
 	cmd.opcode = CMDQ_OP_CFGI_ALL;
-	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
-	arm_smmu_cmdq_issue_sync(smmu);
+	arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
 
 	/* Invalidate any stale TLB entries */
 	if (smmu->features & ARM_SMMU_FEAT_HYP) {
 		cmd.opcode = CMDQ_OP_TLBI_EL2_ALL;
-		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+		arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
 	}
 
 	cmd.opcode = CMDQ_OP_TLBI_NSNH_ALL;
-	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
-	arm_smmu_cmdq_issue_sync(smmu);
+	arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
 
 	/* Event queue */
 	writeq_relaxed(smmu->evtq.q.q_base, smmu->base + ARM_SMMU_EVTQ_BASE);
-- 
2.26.0.106.g9fadedd


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/4] iommu/arm-smmu-v3: Add and use static helper function arm_smmu_get_cmdq()
  2021-08-11 11:48 [PATCH 0/4] Prepare for ECMDQ support Zhen Lei
  2021-08-11 11:48 ` [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance Zhen Lei
  2021-08-11 11:48 ` [PATCH 2/4] iommu/arm-smmu-v3: Add and use static helper function arm_smmu_cmdq_issue_cmd_with_sync() Zhen Lei
@ 2021-08-11 11:48 ` Zhen Lei
  2021-08-11 11:48 ` [PATCH 4/4] iommu/arm-smmu-v3: Extract reusable function __arm_smmu_cmdq_skip_err() Zhen Lei
  2021-08-13 14:33 ` [PATCH 0/4] Prepare for ECMDQ support Will Deacon
  4 siblings, 0 replies; 14+ messages in thread
From: Zhen Lei @ 2021-08-11 11:48 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy, Joerg Roedel, linux-arm-kernel, iommu,
	linux-kernel
  Cc: Zhen Lei

One SMMU has only one normal CMDQ. Therefore, this CMDQ is used regardless
of the core on which the command is inserted. It can be referenced
directly through "smmu->cmdq". However, one SMMU has multiple ECMDQs, and
the ECMDQ used by the core on which the command insertion is executed may
be different. So the helper function arm_smmu_get_cmdq() is added, which
returns the CMDQ/ECMDQ that the current core should use. Currently, the
code that supports ECMDQ is not added. just simply returns "&smmu->cmdq".

Many subfunctions of arm_smmu_cmdq_issue_cmdlist() use "&smmu->cmdq" or
"&smmu->cmdq.q" directly. To support ECMDQ, they need to call the newly
added function arm_smmu_get_cmdq() instead.

Note that normal CMDQ is still required until ECMDQ is available.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 ++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 282f95659580267..be2df5ad2eb51b8 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -335,10 +335,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 	return 0;
 }
 
+static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu)
+{
+	return &smmu->cmdq;
+}
+
 static void arm_smmu_cmdq_build_sync_cmd(u64 *cmd, struct arm_smmu_device *smmu,
-					 u32 prod)
+					 struct arm_smmu_queue *q, u32 prod)
 {
-	struct arm_smmu_queue *q = &smmu->cmdq.q;
 	struct arm_smmu_cmdq_ent ent = {
 		.opcode = CMDQ_OP_CMD_SYNC,
 	};
@@ -579,7 +583,7 @@ static int arm_smmu_cmdq_poll_until_not_full(struct arm_smmu_device *smmu,
 {
 	unsigned long flags;
 	struct arm_smmu_queue_poll qp;
-	struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
+	struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu);
 	int ret = 0;
 
 	/*
@@ -595,7 +599,7 @@ static int arm_smmu_cmdq_poll_until_not_full(struct arm_smmu_device *smmu,
 
 	queue_poll_init(smmu, &qp);
 	do {
-		llq->val = READ_ONCE(smmu->cmdq.q.llq.val);
+		llq->val = READ_ONCE(cmdq->q.llq.val);
 		if (!queue_full(llq))
 			break;
 
@@ -614,7 +618,7 @@ static int __arm_smmu_cmdq_poll_until_msi(struct arm_smmu_device *smmu,
 {
 	int ret = 0;
 	struct arm_smmu_queue_poll qp;
-	struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
+	struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu);
 	u32 *cmd = (u32 *)(Q_ENT(&cmdq->q, llq->prod));
 
 	queue_poll_init(smmu, &qp);
@@ -637,12 +641,12 @@ static int __arm_smmu_cmdq_poll_until_consumed(struct arm_smmu_device *smmu,
 					       struct arm_smmu_ll_queue *llq)
 {
 	struct arm_smmu_queue_poll qp;
-	struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
+	struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu);
 	u32 prod = llq->prod;
 	int ret = 0;
 
 	queue_poll_init(smmu, &qp);
-	llq->val = READ_ONCE(smmu->cmdq.q.llq.val);
+	llq->val = READ_ONCE(cmdq->q.llq.val);
 	do {
 		if (queue_consumed(llq, prod))
 			break;
@@ -732,7 +736,7 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
 	u32 prod;
 	unsigned long flags;
 	bool owner;
-	struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
+	struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu);
 	struct arm_smmu_ll_queue llq = {
 		.max_n_shift = cmdq->q.llq.max_n_shift,
 	}, head = llq;
@@ -772,7 +776,7 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
 	arm_smmu_cmdq_write_entries(cmdq, cmds, llq.prod, n);
 	if (sync) {
 		prod = queue_inc_prod_n(&llq, n);
-		arm_smmu_cmdq_build_sync_cmd(cmd_sync, smmu, prod);
+		arm_smmu_cmdq_build_sync_cmd(cmd_sync, smmu, &cmdq->q, prod);
 		queue_write(Q_ENT(&cmdq->q, prod), cmd_sync, CMDQ_ENT_DWORDS);
 
 		/*
-- 
2.26.0.106.g9fadedd


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4/4] iommu/arm-smmu-v3: Extract reusable function __arm_smmu_cmdq_skip_err()
  2021-08-11 11:48 [PATCH 0/4] Prepare for ECMDQ support Zhen Lei
                   ` (2 preceding siblings ...)
  2021-08-11 11:48 ` [PATCH 3/4] iommu/arm-smmu-v3: Add and use static helper function arm_smmu_get_cmdq() Zhen Lei
@ 2021-08-11 11:48 ` Zhen Lei
  2021-08-13 14:33 ` [PATCH 0/4] Prepare for ECMDQ support Will Deacon
  4 siblings, 0 replies; 14+ messages in thread
From: Zhen Lei @ 2021-08-11 11:48 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy, Joerg Roedel, linux-arm-kernel, iommu,
	linux-kernel
  Cc: Zhen Lei

When SMMU_GERROR.CMDQP_ERR is different to SMMU_GERRORN.CMDQP_ERR, it
indicates that one or more errors have been encountered on a command queue
control page interface. We need to traverse all ECMDQs in that control
page to find all errors. For each ECMDQ error handling, it is much the
same as the CMDQ error handling. This common processing part is extracted
as a new function __arm_smmu_cmdq_skip_err().

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index be2df5ad2eb51b8..597cc0ff5ef40f0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -359,7 +359,8 @@ static void arm_smmu_cmdq_build_sync_cmd(u64 *cmd, struct arm_smmu_device *smmu,
 	arm_smmu_cmdq_build_cmd(cmd, &ent);
 }
 
-static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
+static void __arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu,
+				     struct arm_smmu_queue *q)
 {
 	static const char * const cerror_str[] = {
 		[CMDQ_ERR_CERROR_NONE_IDX]	= "No error",
@@ -370,7 +371,6 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
 
 	int i;
 	u64 cmd[CMDQ_ENT_DWORDS];
-	struct arm_smmu_queue *q = &smmu->cmdq.q;
 	u32 cons = readl_relaxed(q->cons_reg);
 	u32 idx = FIELD_GET(CMDQ_CONS_ERR, cons);
 	struct arm_smmu_cmdq_ent cmd_sync = {
@@ -417,6 +417,11 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
 	queue_write(Q_ENT(q, cons), cmd, q->ent_dwords);
 }
 
+static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
+{
+	__arm_smmu_cmdq_skip_err(smmu, &smmu->cmdq.q);
+}
+
 /*
  * Command queue locking.
  * This is a form of bastardised rwlock with the following major changes:
-- 
2.26.0.106.g9fadedd


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] Prepare for ECMDQ support
  2021-08-11 11:48 [PATCH 0/4] Prepare for ECMDQ support Zhen Lei
                   ` (3 preceding siblings ...)
  2021-08-11 11:48 ` [PATCH 4/4] iommu/arm-smmu-v3: Extract reusable function __arm_smmu_cmdq_skip_err() Zhen Lei
@ 2021-08-13 14:33 ` Will Deacon
  4 siblings, 0 replies; 14+ messages in thread
From: Will Deacon @ 2021-08-13 14:33 UTC (permalink / raw)
  To: Zhen Lei, linux-arm-kernel, iommu, Joerg Roedel, Robin Murphy,
	linux-kernel
  Cc: catalin.marinas, kernel-team, Will Deacon

On Wed, 11 Aug 2021 19:48:48 +0800, Zhen Lei wrote:
> RFC --> v1
> 1. Resend the patches for ECMDQ preparation and remove the patches for ECMDQ implementation.
> 2. Patch 2 is modified. Other patches remain unchanged.
>    1) Add static helper __arm_smmu_cmdq_issue_cmd(), and make arm_smmu_cmdq_issue_cmd()
>       and arm_smmu_cmdq_issue_cmd_with_sync() implement based on it.
>    2) Remove unused arm_smmu_cmdq_issue_sync().
> 
> [...]

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance
      https://git.kernel.org/arm64/c/eff19474b1bd
[2/4] iommu/arm-smmu-v3: Add and use static helper function arm_smmu_cmdq_issue_cmd_with_sync()
      https://git.kernel.org/arm64/c/4537f6f1e2d8
[3/4] iommu/arm-smmu-v3: Add and use static helper function arm_smmu_get_cmdq()
      https://git.kernel.org/arm64/c/8639cc83aac5
[4/4] iommu/arm-smmu-v3: Extract reusable function __arm_smmu_cmdq_skip_err()
      https://git.kernel.org/arm64/c/2cbeaf3f36eb

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance
  2021-08-11 11:48 ` [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance Zhen Lei
@ 2021-08-13 16:01   ` Robin Murphy
  2021-08-13 16:45     ` John Garry
  0 siblings, 1 reply; 14+ messages in thread
From: Robin Murphy @ 2021-08-13 16:01 UTC (permalink / raw)
  To: Zhen Lei, Will Deacon, Joerg Roedel, linux-arm-kernel, iommu,
	linux-kernel

On 2021-08-11 12:48, Zhen Lei wrote:
> The obvious key to the performance optimization of commit 587e6c10a7ce
> ("iommu/arm-smmu-v3: Reduce contention during command-queue insertion") is
> to allow multiple cores to insert commands in parallel after a brief mutex
> contention.
> 
> Obviously, inserting as many commands at a time as possible can reduce the
> number of times the mutex contention participates, thereby improving the
> overall performance. At least it reduces the number of calls to function
> arm_smmu_cmdq_issue_cmdlist().
> 
> Therefore, use command queue batching helpers to insert multiple commands
> at a time.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 235f9bdaeaf223b..c81cd929047f573 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1747,15 +1747,16 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master *master)
>   {
>   	int i;
>   	struct arm_smmu_cmdq_ent cmd;
> +	struct arm_smmu_cmdq_batch cmds = {};

BTW, it looks like this has crossed over with John's patch removing these.

Robin.

>   
>   	arm_smmu_atc_inv_to_cmd(0, 0, 0, &cmd);
>   
>   	for (i = 0; i < master->num_streams; i++) {
>   		cmd.atc.sid = master->streams[i].id;
> -		arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
> +		arm_smmu_cmdq_batch_add(master->smmu, &cmds, &cmd);
>   	}
>   
> -	return arm_smmu_cmdq_issue_sync(master->smmu);
> +	return arm_smmu_cmdq_batch_submit(master->smmu, &cmds);
>   }
>   
>   int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance
  2021-08-13 16:01   ` Robin Murphy
@ 2021-08-13 16:45     ` John Garry
  2021-08-16  2:15       ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 14+ messages in thread
From: John Garry @ 2021-08-13 16:45 UTC (permalink / raw)
  To: Robin Murphy, Zhen Lei, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel

On 13/08/2021 17:01, Robin Murphy wrote:
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 235f9bdaeaf223b..c81cd929047f573 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -1747,15 +1747,16 @@ static int arm_smmu_atc_inv_master(struct 
>> arm_smmu_master *master)
>>   {
>>       int i;
>>       struct arm_smmu_cmdq_ent cmd;
>> +    struct arm_smmu_cmdq_batch cmds = {};
> 
> BTW, it looks like this has crossed over with John's patch removing these.

It is only called from arm_smmu_disable_ats(), so not hot-path by the 
look for it. Or who even has ats HW ...?

But it should be at least cleaned-up for consistency. Leizhen?

Thanks,
John

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance
  2021-08-13 16:45     ` John Garry
@ 2021-08-16  2:15       ` Leizhen (ThunderTown)
  2021-08-16  4:05         ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 14+ messages in thread
From: Leizhen (ThunderTown) @ 2021-08-16  2:15 UTC (permalink / raw)
  To: John Garry, Robin Murphy, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel



On 2021/8/14 0:45, John Garry wrote:
> On 13/08/2021 17:01, Robin Murphy wrote:
>>>
>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> index 235f9bdaeaf223b..c81cd929047f573 100644
>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> @@ -1747,15 +1747,16 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master *master)
>>>   {
>>>       int i;
>>>       struct arm_smmu_cmdq_ent cmd;
>>> +    struct arm_smmu_cmdq_batch cmds = {};
>>
>> BTW, it looks like this has crossed over with John's patch removing these.
> 
> It is only called from arm_smmu_disable_ats(), so not hot-path by the look for it. Or who even has ats HW ...?
> 
> But it should be at least cleaned-up for consistency. Leizhen?

Okay, I'll revise it. But Will already took it. So I'm not sure whether to send v2 or a separate patch.

> 
> Thanks,
> John
> .
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance
  2021-08-16  2:15       ` Leizhen (ThunderTown)
@ 2021-08-16  4:05         ` Leizhen (ThunderTown)
  2021-08-16  7:24           ` John Garry
  0 siblings, 1 reply; 14+ messages in thread
From: Leizhen (ThunderTown) @ 2021-08-16  4:05 UTC (permalink / raw)
  To: John Garry, Robin Murphy, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel



On 2021/8/16 10:15, Leizhen (ThunderTown) wrote:
> 
> 
> On 2021/8/14 0:45, John Garry wrote:
>> On 13/08/2021 17:01, Robin Murphy wrote:
>>>>
>>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> index 235f9bdaeaf223b..c81cd929047f573 100644
>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> @@ -1747,15 +1747,16 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master *master)
>>>>   {
>>>>       int i;
>>>>       struct arm_smmu_cmdq_ent cmd;
>>>> +    struct arm_smmu_cmdq_batch cmds = {};
>>>
>>> BTW, it looks like this has crossed over with John's patch removing these.
>>
>> It is only called from arm_smmu_disable_ats(), so not hot-path by the look for it. Or who even has ats HW ...?
>>
>> But it should be at least cleaned-up for consistency. Leizhen?
> 
> Okay, I'll revise it. But Will already took it. So I'm not sure whether to send v2 or a separate patch.

I think I'd better post v2, otherwise I should write the same description.

In addition, I find that function arm_smmu_cmdq_build_cmd() can also be optimized
slightly, three useless instructions can be reduced.

Case 1):
void arm_smmu_cmdq_build_cmd_tst1(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
{
        memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
        cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
}
0000000000004608 <arm_smmu_cmdq_build_cmd_tst1>:
    4608:       a9007c1f        stp     xzr, xzr, [x0]
    460c:       39400022        ldrb    w2, [x1]
    4610:       f9400001        ldr     x1, [x0]
    4614:       aa020021        orr     x1, x1, x2
    4618:       f9000001        str     x1, [x0]
    461c:       d65f03c0        ret

Case 2):
void arm_smmu_cmdq_build_cmd_tst2(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
{
        int i;

        cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
        for (i = 1; i < CMDQ_ENT_DWORDS; i++)
                cmd[i] = 0;
}
0000000000004620 <arm_smmu_cmdq_build_cmd_tst2>:
    4620:       39400021        ldrb    w1, [x1]
    4624:       a9007c01        stp     x1, xzr, [x0]
    4628:       d65f03c0        ret
    462c:       d503201f        nop

Case 3):
void arm_smmu_cmdq_build_cmd_tst3(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
{
        memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
        cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
}
0000000000004630 <arm_smmu_cmdq_build_cmd_tst3>:
    4630:       a9007c1f        stp     xzr, xzr, [x0]
    4634:       39400021        ldrb    w1, [x1]
    4638:       f9000001        str     x1, [x0]
    463c:       d65f03c0        ret

> 
>>
>> Thanks,
>> John
>> .
>>
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance
  2021-08-16  4:05         ` Leizhen (ThunderTown)
@ 2021-08-16  7:24           ` John Garry
  2021-08-16  7:47             ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 14+ messages in thread
From: John Garry @ 2021-08-16  7:24 UTC (permalink / raw)
  To: Leizhen (ThunderTown),
	Robin Murphy, Will Deacon, Joerg Roedel, linux-arm-kernel, iommu,
	linux-kernel

> In addition, I find that function arm_smmu_cmdq_build_cmd() can also be optimized
> slightly, three useless instructions can be reduced.

I think that you could optimise further by pre-building commonly used 
commands.

For example, CMD_SYNC without MSI polling is always the same. And then 
only different in 1 field for MSI polling.

But you need to check if the performance gain is worth the change.

> 
> Case 1):
> void arm_smmu_cmdq_build_cmd_tst1(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
>          memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
>          cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
> }
> 0000000000004608 <arm_smmu_cmdq_build_cmd_tst1>:
>      4608:       a9007c1f        stp     xzr, xzr, [x0]
>      460c:       39400022        ldrb    w2, [x1]
>      4610:       f9400001        ldr     x1, [x0]
>      4614:       aa020021        orr     x1, x1, x2
>      4618:       f9000001        str     x1, [x0]
>      461c:       d65f03c0        ret
> 
> Case 2):
> void arm_smmu_cmdq_build_cmd_tst2(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
>          int i;
> 
>          cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
>          for (i = 1; i < CMDQ_ENT_DWORDS; i++)
>                  cmd[i] = 0;
> }
> 0000000000004620 <arm_smmu_cmdq_build_cmd_tst2>:
>      4620:       39400021        ldrb    w1, [x1]
>      4624:       a9007c01        stp     x1, xzr, [x0]
>      4628:       d65f03c0        ret
>      462c:       d503201f        nop
> 
> Case 3):
> void arm_smmu_cmdq_build_cmd_tst3(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
>          memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
>          cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
> }
> 0000000000004630 <arm_smmu_cmdq_build_cmd_tst3>:
>      4630:       a9007c1f        stp     xzr, xzr, [x0]
>      4634:       39400021        ldrb    w1, [x1]
>      4638:       f9000001        str     x1, [x0]
>      463c:       d65f03c0        ret
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance
  2021-08-16  7:24           ` John Garry
@ 2021-08-16  7:47             ` Leizhen (ThunderTown)
  2021-08-16  8:21               ` Will Deacon
  0 siblings, 1 reply; 14+ messages in thread
From: Leizhen (ThunderTown) @ 2021-08-16  7:47 UTC (permalink / raw)
  To: John Garry, Robin Murphy, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel



On 2021/8/16 15:24, John Garry wrote:
>> In addition, I find that function arm_smmu_cmdq_build_cmd() can also be optimized
>> slightly, three useless instructions can be reduced.
> 
> I think that you could optimise further by pre-building commonly used commands.
> 
> For example, CMD_SYNC without MSI polling is always the same. And then only different in 1 field for MSI polling.
> 
> But you need to check if the performance gain is worth the change.

Good advice. I can give it a try.

> 
>>
>> Case 1):
>> void arm_smmu_cmdq_build_cmd_tst1(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>> {
>>          memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
>>          cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
>> }
>> 0000000000004608 <arm_smmu_cmdq_build_cmd_tst1>:
>>      4608:       a9007c1f        stp     xzr, xzr, [x0]
>>      460c:       39400022        ldrb    w2, [x1]
>>      4610:       f9400001        ldr     x1, [x0]
>>      4614:       aa020021        orr     x1, x1, x2
>>      4618:       f9000001        str     x1, [x0]
>>      461c:       d65f03c0        ret
>>
>> Case 2):
>> void arm_smmu_cmdq_build_cmd_tst2(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>> {
>>          int i;
>>
>>          cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
>>          for (i = 1; i < CMDQ_ENT_DWORDS; i++)
>>                  cmd[i] = 0;
>> }
>> 0000000000004620 <arm_smmu_cmdq_build_cmd_tst2>:
>>      4620:       39400021        ldrb    w1, [x1]
>>      4624:       a9007c01        stp     x1, xzr, [x0]
>>      4628:       d65f03c0        ret
>>      462c:       d503201f        nop
>>
>> Case 3):
>> void arm_smmu_cmdq_build_cmd_tst3(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>> {
>>          memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
>>          cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
>> }
>> 0000000000004630 <arm_smmu_cmdq_build_cmd_tst3>:
>>      4630:       a9007c1f        stp     xzr, xzr, [x0]
>>      4634:       39400021        ldrb    w1, [x1]
>>      4638:       f9000001        str     x1, [x0]
>>      463c:       d65f03c0        ret
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance
  2021-08-16  7:47             ` Leizhen (ThunderTown)
@ 2021-08-16  8:21               ` Will Deacon
  2021-08-16  8:41                 ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 14+ messages in thread
From: Will Deacon @ 2021-08-16  8:21 UTC (permalink / raw)
  To: Leizhen (ThunderTown)
  Cc: John Garry, Robin Murphy, Joerg Roedel, linux-arm-kernel, iommu,
	linux-kernel

On Mon, Aug 16, 2021 at 03:47:58PM +0800, Leizhen (ThunderTown) wrote:
> 
> 
> On 2021/8/16 15:24, John Garry wrote:
> >> In addition, I find that function arm_smmu_cmdq_build_cmd() can also be optimized
> >> slightly, three useless instructions can be reduced.
> > 
> > I think that you could optimise further by pre-building commonly used commands.
> > 
> > For example, CMD_SYNC without MSI polling is always the same. And then only different in 1 field for MSI polling.
> > 
> > But you need to check if the performance gain is worth the change.
> 
> Good advice. I can give it a try.

Please send it as a new patch on top. I've queued the old one and sent
it to Joerg. Since this is just further cleanup, it can be done separately.

Will

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance
  2021-08-16  8:21               ` Will Deacon
@ 2021-08-16  8:41                 ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 14+ messages in thread
From: Leizhen (ThunderTown) @ 2021-08-16  8:41 UTC (permalink / raw)
  To: Will Deacon
  Cc: John Garry, Robin Murphy, Joerg Roedel, linux-arm-kernel, iommu,
	linux-kernel



On 2021/8/16 16:21, Will Deacon wrote:
> On Mon, Aug 16, 2021 at 03:47:58PM +0800, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2021/8/16 15:24, John Garry wrote:
>>>> In addition, I find that function arm_smmu_cmdq_build_cmd() can also be optimized
>>>> slightly, three useless instructions can be reduced.
>>>
>>> I think that you could optimise further by pre-building commonly used commands.
>>>
>>> For example, CMD_SYNC without MSI polling is always the same. And then only different in 1 field for MSI polling.
>>>
>>> But you need to check if the performance gain is worth the change.
>>
>> Good advice. I can give it a try.
> 
> Please send it as a new patch on top. I've queued the old one and sent
> it to Joerg. Since this is just further cleanup, it can be done separately.

OK, I'll post a separate patch later.

> 
> Will
> .
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-08-16  8:41 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-11 11:48 [PATCH 0/4] Prepare for ECMDQ support Zhen Lei
2021-08-11 11:48 ` [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance Zhen Lei
2021-08-13 16:01   ` Robin Murphy
2021-08-13 16:45     ` John Garry
2021-08-16  2:15       ` Leizhen (ThunderTown)
2021-08-16  4:05         ` Leizhen (ThunderTown)
2021-08-16  7:24           ` John Garry
2021-08-16  7:47             ` Leizhen (ThunderTown)
2021-08-16  8:21               ` Will Deacon
2021-08-16  8:41                 ` Leizhen (ThunderTown)
2021-08-11 11:48 ` [PATCH 2/4] iommu/arm-smmu-v3: Add and use static helper function arm_smmu_cmdq_issue_cmd_with_sync() Zhen Lei
2021-08-11 11:48 ` [PATCH 3/4] iommu/arm-smmu-v3: Add and use static helper function arm_smmu_get_cmdq() Zhen Lei
2021-08-11 11:48 ` [PATCH 4/4] iommu/arm-smmu-v3: Extract reusable function __arm_smmu_cmdq_skip_err() Zhen Lei
2021-08-13 14:33 ` [PATCH 0/4] Prepare for ECMDQ support Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).