LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v2 0/2] iommu/arm-smmu-v3: Perform some simple optimizations for arm_smmu_cmdq_build_cmd()
@ 2021-08-18  8:04 Zhen Lei
  2021-08-18  8:04 ` [PATCH v2 1/2] iommu/arm-smmu-v3: Properly handle the return value of arm_smmu_cmdq_build_cmd() Zhen Lei
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Zhen Lei @ 2021-08-18  8:04 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy, Joerg Roedel, linux-arm-kernel, iommu,
	linux-kernel
  Cc: Zhen Lei, John Garry

v1 --> v2:
1. Add patch 1, Properly handle the return value of arm_smmu_cmdq_build_cmd()
2. Remove arm_smmu_cmdq_copy_cmd(). In addition, when build command fails, out_cmd is not filled.


Zhen Lei (2):
  iommu/arm-smmu-v3: Properly handle the return value of
    arm_smmu_cmdq_build_cmd()
  iommu/arm-smmu-v3: Simplify useless instructions in
    arm_smmu_cmdq_build_cmd()

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 ++++++++++++++-------
 1 file changed, 20 insertions(+), 9 deletions(-)

-- 
2.26.0.106.g9fadedd


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] iommu/arm-smmu-v3: Properly handle the return value of arm_smmu_cmdq_build_cmd()
  2021-08-18  8:04 [PATCH v2 0/2] iommu/arm-smmu-v3: Perform some simple optimizations for arm_smmu_cmdq_build_cmd() Zhen Lei
@ 2021-08-18  8:04 ` Zhen Lei
  2021-08-18  8:04 ` [PATCH v2 2/2] iommu/arm-smmu-v3: Simplify useless instructions in arm_smmu_cmdq_build_cmd() Zhen Lei
  2021-10-04 12:05 ` [PATCH v2 0/2] iommu/arm-smmu-v3: Perform some simple optimizations for arm_smmu_cmdq_build_cmd() Will Deacon
  2 siblings, 0 replies; 6+ messages in thread
From: Zhen Lei @ 2021-08-18  8:04 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy, Joerg Roedel, linux-arm-kernel, iommu,
	linux-kernel
  Cc: Zhen Lei, John Garry

1. Build command CMD_SYNC cannot fail. So the return value can be ignored.
2. The arm_smmu_cmdq_build_cmd() almost never fails, the addition of
   "unlikely()" can optimize the instruction pipeline.
3. Check the return value in arm_smmu_cmdq_batch_add().

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 3646bf8f021cd4c..01e95b56ffa07d1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -409,10 +409,7 @@ static void __arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu,
 		dev_err(smmu->dev, "\t0x%016llx\n", (unsigned long long)cmd[i]);
 
 	/* Convert the erroneous command into a CMD_SYNC */
-	if (arm_smmu_cmdq_build_cmd(cmd, &cmd_sync)) {
-		dev_err(smmu->dev, "failed to convert to CMD_SYNC\n");
-		return;
-	}
+	arm_smmu_cmdq_build_cmd(cmd, &cmd_sync);
 
 	queue_write(Q_ENT(q, cons), cmd, q->ent_dwords);
 }
@@ -860,7 +857,7 @@ static int __arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
 {
 	u64 cmd[CMDQ_ENT_DWORDS];
 
-	if (arm_smmu_cmdq_build_cmd(cmd, ent)) {
+	if (unlikely(arm_smmu_cmdq_build_cmd(cmd, ent))) {
 		dev_warn(smmu->dev, "ignoring unknown CMDQ opcode 0x%x\n",
 			 ent->opcode);
 		return -EINVAL;
@@ -885,11 +882,20 @@ static void arm_smmu_cmdq_batch_add(struct arm_smmu_device *smmu,
 				    struct arm_smmu_cmdq_batch *cmds,
 				    struct arm_smmu_cmdq_ent *cmd)
 {
+	int index;
+
 	if (cmds->num == CMDQ_BATCH_ENTRIES) {
 		arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, false);
 		cmds->num = 0;
 	}
-	arm_smmu_cmdq_build_cmd(&cmds->cmds[cmds->num * CMDQ_ENT_DWORDS], cmd);
+
+	index = cmds->num * CMDQ_ENT_DWORDS;
+	if (unlikely(arm_smmu_cmdq_build_cmd(&cmds->cmds[index], cmd))) {
+		dev_warn(smmu->dev, "ignoring unknown CMDQ opcode 0x%x\n",
+			 cmd->opcode);
+		return;
+	}
+
 	cmds->num++;
 }
 
-- 
2.26.0.106.g9fadedd


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] iommu/arm-smmu-v3: Simplify useless instructions in arm_smmu_cmdq_build_cmd()
  2021-08-18  8:04 [PATCH v2 0/2] iommu/arm-smmu-v3: Perform some simple optimizations for arm_smmu_cmdq_build_cmd() Zhen Lei
  2021-08-18  8:04 ` [PATCH v2 1/2] iommu/arm-smmu-v3: Properly handle the return value of arm_smmu_cmdq_build_cmd() Zhen Lei
@ 2021-08-18  8:04 ` Zhen Lei
  2021-10-04 12:07   ` Will Deacon
  2021-10-04 12:05 ` [PATCH v2 0/2] iommu/arm-smmu-v3: Perform some simple optimizations for arm_smmu_cmdq_build_cmd() Will Deacon
  2 siblings, 1 reply; 6+ messages in thread
From: Zhen Lei @ 2021-08-18  8:04 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy, Joerg Roedel, linux-arm-kernel, iommu,
	linux-kernel
  Cc: Zhen Lei, John Garry

Although the parameter 'cmd' is always passed by a local array variable,
and only this function modifies it, the compiler does not know this. The
compiler almost always reads the value of cmd[i] from memory rather than
directly using the value cached in the register. This generates many
useless instruction operations and affects the performance to some extent.

To guide the compiler for proper optimization, 'cmd' is defined as a local
array variable, marked as register, and copied to the output parameter at
a time when the function is returned.

The optimization effect can be viewed by running the "size arm-smmu-v3.o"
command.

Before:
   text    data     bss     dec     hex
  26954    1348      56   28358    6ec6

After:
   text    data     bss     dec     hex
  26762    1348      56   28166    6e06

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 01e95b56ffa07d1..7cec0c967f71d86 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -234,10 +234,12 @@ static int queue_remove_raw(struct arm_smmu_queue *q, u64 *ent)
 }
 
 /* High-level queue accessors */
-static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
+static int arm_smmu_cmdq_build_cmd(u64 *out_cmd, struct arm_smmu_cmdq_ent *ent)
 {
-	memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
-	cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
+	register u64 cmd[CMDQ_ENT_DWORDS];
+
+	cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
+	cmd[1] = 0;
 
 	switch (ent->opcode) {
 	case CMDQ_OP_TLBI_EL2_ALL:
@@ -332,6 +334,9 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		return -ENOENT;
 	}
 
+	out_cmd[0] = cmd[0];
+	out_cmd[1] = cmd[1];
+
 	return 0;
 }
 
-- 
2.26.0.106.g9fadedd


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 0/2] iommu/arm-smmu-v3: Perform some simple optimizations for arm_smmu_cmdq_build_cmd()
  2021-08-18  8:04 [PATCH v2 0/2] iommu/arm-smmu-v3: Perform some simple optimizations for arm_smmu_cmdq_build_cmd() Zhen Lei
  2021-08-18  8:04 ` [PATCH v2 1/2] iommu/arm-smmu-v3: Properly handle the return value of arm_smmu_cmdq_build_cmd() Zhen Lei
  2021-08-18  8:04 ` [PATCH v2 2/2] iommu/arm-smmu-v3: Simplify useless instructions in arm_smmu_cmdq_build_cmd() Zhen Lei
@ 2021-10-04 12:05 ` Will Deacon
  2 siblings, 0 replies; 6+ messages in thread
From: Will Deacon @ 2021-10-04 12:05 UTC (permalink / raw)
  To: Joerg Roedel, Zhen Lei, linux-kernel, linux-arm-kernel,
	Robin Murphy, iommu
  Cc: catalin.marinas, kernel-team, Will Deacon, John Garry

On Wed, 18 Aug 2021 16:04:50 +0800, Zhen Lei wrote:
> v1 --> v2:
> 1. Add patch 1, Properly handle the return value of arm_smmu_cmdq_build_cmd()
> 2. Remove arm_smmu_cmdq_copy_cmd(). In addition, when build command fails, out_cmd is not filled.
> 
> 
> Zhen Lei (2):
>   iommu/arm-smmu-v3: Properly handle the return value of
>     arm_smmu_cmdq_build_cmd()
>   iommu/arm-smmu-v3: Simplify useless instructions in
>     arm_smmu_cmdq_build_cmd()
> 
> [...]

Applied first patch only to will (for-joerg/arm-smmu/updates), thanks!

[1/2] iommu/arm-smmu-v3: Properly handle the return value of arm_smmu_cmdq_build_cmd()
      https://git.kernel.org/will/c/59d9bd727495

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] iommu/arm-smmu-v3: Simplify useless instructions in arm_smmu_cmdq_build_cmd()
  2021-08-18  8:04 ` [PATCH v2 2/2] iommu/arm-smmu-v3: Simplify useless instructions in arm_smmu_cmdq_build_cmd() Zhen Lei
@ 2021-10-04 12:07   ` Will Deacon
  2021-10-20  7:29     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 6+ messages in thread
From: Will Deacon @ 2021-10-04 12:07 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Robin Murphy, Joerg Roedel, linux-arm-kernel, iommu,
	linux-kernel, John Garry

On Wed, Aug 18, 2021 at 04:04:52PM +0800, Zhen Lei wrote:
> Although the parameter 'cmd' is always passed by a local array variable,
> and only this function modifies it, the compiler does not know this. The
> compiler almost always reads the value of cmd[i] from memory rather than
> directly using the value cached in the register. This generates many
> useless instruction operations and affects the performance to some extent.
> 
> To guide the compiler for proper optimization, 'cmd' is defined as a local
> array variable, marked as register, and copied to the output parameter at
> a time when the function is returned.
> 
> The optimization effect can be viewed by running the "size arm-smmu-v3.o"
> command.
> 
> Before:
>    text    data     bss     dec     hex
>   26954    1348      56   28358    6ec6
> 
> After:
>    text    data     bss     dec     hex
>   26762    1348      56   28166    6e06
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 01e95b56ffa07d1..7cec0c967f71d86 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -234,10 +234,12 @@ static int queue_remove_raw(struct arm_smmu_queue *q, u64 *ent)
>  }
>  
>  /* High-level queue accessors */
> -static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> +static int arm_smmu_cmdq_build_cmd(u64 *out_cmd, struct arm_smmu_cmdq_ent *ent)
>  {
> -	memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
> -	cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
> +	register u64 cmd[CMDQ_ENT_DWORDS];

'register' is pretty badly specified outside of a handful of limited uses in
conjunction with inline asm, so I really don't think we should be using it
here.

> +	cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);

This generates a compiler warning about taking the address of a 'register'
variable.

Will

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] iommu/arm-smmu-v3: Simplify useless instructions in arm_smmu_cmdq_build_cmd()
  2021-10-04 12:07   ` Will Deacon
@ 2021-10-20  7:29     ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 6+ messages in thread
From: Leizhen (ThunderTown) @ 2021-10-20  7:29 UTC (permalink / raw)
  To: Will Deacon
  Cc: Robin Murphy, Joerg Roedel, linux-arm-kernel, iommu,
	linux-kernel, John Garry



On 2021/10/4 20:07, Will Deacon wrote:
> On Wed, Aug 18, 2021 at 04:04:52PM +0800, Zhen Lei wrote:
>> Although the parameter 'cmd' is always passed by a local array variable,
>> and only this function modifies it, the compiler does not know this. The
>> compiler almost always reads the value of cmd[i] from memory rather than
>> directly using the value cached in the register. This generates many
>> useless instruction operations and affects the performance to some extent.
>>
>> To guide the compiler for proper optimization, 'cmd' is defined as a local
>> array variable, marked as register, and copied to the output parameter at
>> a time when the function is returned.
>>
>> The optimization effect can be viewed by running the "size arm-smmu-v3.o"
>> command.
>>
>> Before:
>>    text    data     bss     dec     hex
>>   26954    1348      56   28358    6ec6
>>
>> After:
>>    text    data     bss     dec     hex
>>   26762    1348      56   28166    6e06
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 11 ++++++++---
>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 01e95b56ffa07d1..7cec0c967f71d86 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -234,10 +234,12 @@ static int queue_remove_raw(struct arm_smmu_queue *q, u64 *ent)
>>  }
>>  
>>  /* High-level queue accessors */
>> -static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>> +static int arm_smmu_cmdq_build_cmd(u64 *out_cmd, struct arm_smmu_cmdq_ent *ent)
>>  {
>> -	memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
>> -	cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
>> +	register u64 cmd[CMDQ_ENT_DWORDS];
> 
> 'register' is pretty badly specified outside of a handful of limited uses in
> conjunction with inline asm, so I really don't think we should be using it
> here.

OK, I think I was overly aggressive in the beginning, and I just tried to
remove the register modifier and get the same optimization.

> 
>> +	cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
> 
> This generates a compiler warning about taking the address of a 'register'
> variable.
> 
> Will
> .
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-10-20  7:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-18  8:04 [PATCH v2 0/2] iommu/arm-smmu-v3: Perform some simple optimizations for arm_smmu_cmdq_build_cmd() Zhen Lei
2021-08-18  8:04 ` [PATCH v2 1/2] iommu/arm-smmu-v3: Properly handle the return value of arm_smmu_cmdq_build_cmd() Zhen Lei
2021-08-18  8:04 ` [PATCH v2 2/2] iommu/arm-smmu-v3: Simplify useless instructions in arm_smmu_cmdq_build_cmd() Zhen Lei
2021-10-04 12:07   ` Will Deacon
2021-10-20  7:29     ` Leizhen (ThunderTown)
2021-10-04 12:05 ` [PATCH v2 0/2] iommu/arm-smmu-v3: Perform some simple optimizations for arm_smmu_cmdq_build_cmd() Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).