LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC PATCH v8 00/21] riscv: Add vector ISA support
@ 2021-09-08 17:45 Greentime Hu
  2021-09-08 17:45 ` [RFC PATCH v8 01/21] riscv: Separate patch for cflags and aflags Greentime Hu
                   ` (22 more replies)
  0 siblings, 23 replies; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

This patchset is implemented based on vector 1.0-rc1 spec to add vector
support in riscv Linux kernel. To make this happen, we defined a new
structure __riscv_v_state to save the vector related registers. It is used
for both kernel space and user space.

 - In kernel space, the datap pointer in __riscv_v_state will be allocated
   dynamically to save vector registers.
 - In user space,
	- In signal handler of user space, datap will point to the address
          of the __riscv_v_state data structure to save vector
          registers in stack. We also create a __reserved[] array for
	  future extensions.
	- In ptrace, the data will be put in ubuf in which we use
       	  riscv_vr_get()/riscv_vr_set() to get or set the
	  __riscv_v_state data structure from/to it, datap pointer
	  would be zeroed and vector registers will be copied to the
	  address right after the __riscv_v_state structure in ubuf.

This patchset also adds support for kernel mode vector, kernel XOR
implementation with vector ISA and includes several bug fixes and code
refinement.

This patchset is rebased to v5.14 and it is tested by running several
vector programs simultaneously. It also can get the correct ucontext_t in
signal handler and restore correct context after sigreturn. It is also
tested with ptrace() syscall to use PTRACE_GETREGSET/PTRACE_SETREGSET to
get/set the vector registers. I have tested vlen=128 and vlen=256 cases in
qemu-system-riscv64 provided by Frank Chang.

We have sent patches to glibc mailing list for ifunc support and sigcontext
changes. We will send patches for vector support to glibc mailing list
recently.

 [1] https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc

---
Changelog V8
 - Rebase to v5.14
 - Refine struct __riscv_v_state with struct __riscv_ctx_hdr
 - Refine has_vector into a static key
 - Defined __reserved space in struct sigcontext for vector and future extensions

Changelog V7
 - Add support for kernel mode vector
 - Add vector extension XOR implementation
 - Optimize task switch codes of vector
 - Allocate space for vector registers in start_thread()
 - Fix an illegal instruction exception when accessing vlenb
 - Optimize vector registers initialization
 - Initialize vector registers with proper vsetvli then it can work normally
 - Refine ptrace porting due to generic API changed
 - Code clean up

Changelog V6
 - Replace vle.v/vse.v instructions with vle8.v/vse8.v based on 0.9 spec
 - Add comments based on mailinglist feedback
 - Fix rv32 build error

Changelog V5
 - Using regset_size() correctly in generic ptrace
 - Fix the ptrace porting
 - Fix compile warning

Changelog V4
 - Support dynamic vlen
 - Fix bugs: lazy save/resotre, not saving vtype
 - Update VS bit offset based on latest vector spec
 - Add new vector csr based on latest vector spec
 - Code refine and removed unused macros

Changelog V3
 - Rebase linux-5.6-rc3 and tested with qemu
 - Seperate patches with Anup's advice
 - Give out a ABI puzzle with unlimited vlen

Changelog V2
 - Fixup typo "vecotr, fstate_save->vstate_save".
 - Fixup wrong saved registers' length in vector.S.
 - Seperate unrelated patches from this one.

Greentime Hu (15):
  riscv: Add new csr defines related to vector extension
  riscv: Add has_vector/riscv_vsize to save vector features.
  riscv: Add vector struct and assembler definitions
  riscv: Add task switch support for vector
  riscv: Add ptrace vector support
  riscv: Add sigcontext save/restore for vector
  riscv: Add support for kernel mode vector
  riscv: Use CSR_STATUS to replace sstatus in vector.S
  riscv: Add vector extension XOR implementation
  riscv: Initialize vector registers with proper vsetvli then it can
    work normally
  riscv: Optimize vector registers initialization
  riscv: Fix an illegal instruction exception when accessing vlenb
    without enable vector first
  riscv: Allocate space for vector registers in start_thread()
  riscv: Optimize task switch codes of vector
  riscv: Turn has_vector into a static key if VECTOR=y

Guo Ren (5):
  riscv: Separate patch for cflags and aflags
  riscv: Rename __switch_to_aux -> fpu
  riscv: Extending cpufeature.c to detect V-extension
  riscv: Add vector feature to compile
  riscv: Reset vector register

Vincent Chen (1):
  riscv: signal: Report signal frame size to userspace via auxv

 arch/riscv/Kconfig                       |   9 ++
 arch/riscv/Makefile                      |  19 ++-
 arch/riscv/include/asm/csr.h             |  16 ++-
 arch/riscv/include/asm/elf.h             |  41 +++---
 arch/riscv/include/asm/processor.h       |   3 +
 arch/riscv/include/asm/switch_to.h       |  71 +++++++++-
 arch/riscv/include/asm/vector.h          |  16 +++
 arch/riscv/include/asm/xor.h             |  74 ++++++++++
 arch/riscv/include/uapi/asm/auxvec.h     |   1 +
 arch/riscv/include/uapi/asm/hwcap.h      |   1 +
 arch/riscv/include/uapi/asm/ptrace.h     |  25 ++++
 arch/riscv/include/uapi/asm/sigcontext.h |  24 ++++
 arch/riscv/kernel/Makefile               |   7 +
 arch/riscv/kernel/asm-offsets.c          |   8 ++
 arch/riscv/kernel/cpufeature.c           |  16 +++
 arch/riscv/kernel/entry.S                |   6 +-
 arch/riscv/kernel/head.S                 |  22 ++-
 arch/riscv/kernel/kernel_mode_vector.c   | 158 +++++++++++++++++++++
 arch/riscv/kernel/process.c              |  49 +++++++
 arch/riscv/kernel/ptrace.c               |  71 ++++++++++
 arch/riscv/kernel/setup.c                |   4 +
 arch/riscv/kernel/signal.c               | 172 ++++++++++++++++++++++-
 arch/riscv/kernel/vector.S               |  81 +++++++++++
 arch/riscv/lib/Makefile                  |   1 +
 arch/riscv/lib/xor.S                     |  81 +++++++++++
 include/uapi/linux/elf.h                 |   1 +
 26 files changed, 941 insertions(+), 36 deletions(-)
 create mode 100644 arch/riscv/include/asm/vector.h
 create mode 100644 arch/riscv/include/asm/xor.h
 create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
 create mode 100644 arch/riscv/kernel/vector.S
 create mode 100644 arch/riscv/lib/xor.S

-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 01/21] riscv: Separate patch for cflags and aflags
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-08 17:45 ` [RFC PATCH v8 02/21] riscv: Rename __switch_to_aux -> fpu Greentime Hu
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

From: Guo Ren <ren_guo@c-sky.com>

Use "subst fd" in Makefile is a hack way and it's not convenient
to add new ISA feature. Just separate them into riscv-march-cflags
and riscv-march-aflags.

Signed-off-by: Guo Ren <ren_guo@c-sky.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
---
 arch/riscv/Makefile | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index bc74afdbf31e..428bd3bc202f 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -48,12 +48,18 @@ endif
 endif
 
 # ISA string setting
-riscv-march-$(CONFIG_ARCH_RV32I)	:= rv32ima
-riscv-march-$(CONFIG_ARCH_RV64I)	:= rv64ima
-riscv-march-$(CONFIG_FPU)		:= $(riscv-march-y)fd
-riscv-march-$(CONFIG_RISCV_ISA_C)	:= $(riscv-march-y)c
-KBUILD_CFLAGS += -march=$(subst fd,,$(riscv-march-y))
-KBUILD_AFLAGS += -march=$(riscv-march-y)
+riscv-march-cflags-$(CONFIG_ARCH_RV32I)		:= rv32ima
+riscv-march-cflags-$(CONFIG_ARCH_RV64I)		:= rv64ima
+riscv-march-$(CONFIG_FPU)			:= $(riscv-march-y)fd
+riscv-march-cflags-$(CONFIG_RISCV_ISA_C)	:= $(riscv-march-cflags-y)c
+
+riscv-march-aflags-$(CONFIG_ARCH_RV32I)		:= rv32ima
+riscv-march-aflags-$(CONFIG_ARCH_RV64I)		:= rv64ima
+riscv-march-aflags-$(CONFIG_FPU)		:= $(riscv-march-aflags-y)fd
+riscv-march-aflags-$(CONFIG_RISCV_ISA_C)	:= $(riscv-march-aflags-y)c
+
+KBUILD_CFLAGS += -march=$(riscv-march-cflags-y)
+KBUILD_AFLAGS += -march=$(riscv-march-aflags-y)
 
 KBUILD_CFLAGS += -mno-save-restore
 KBUILD_CFLAGS += -DCONFIG_PAGE_OFFSET=$(CONFIG_PAGE_OFFSET)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 02/21] riscv: Rename __switch_to_aux -> fpu
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
  2021-09-08 17:45 ` [RFC PATCH v8 01/21] riscv: Separate patch for cflags and aflags Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-08 17:45 ` [RFC PATCH v8 03/21] riscv: Extending cpufeature.c to detect V-extension Greentime Hu
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

From: Guo Ren <ren_guo@c-sky.com>

The name of __switch_to_aux is not clear and rename it with the
determine function: __switch_to_fpu. Next we could add other regs'
switch.

Signed-off-by: Guo Ren <ren_guo@c-sky.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
---
 arch/riscv/include/asm/switch_to.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index 0a3f4f95c555..ec83770b3d98 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -45,7 +45,7 @@ static inline void fstate_restore(struct task_struct *task,
 	}
 }
 
-static inline void __switch_to_aux(struct task_struct *prev,
+static inline void __switch_to_fpu(struct task_struct *prev,
 				   struct task_struct *next)
 {
 	struct pt_regs *regs;
@@ -65,7 +65,7 @@ static __always_inline bool has_fpu(void)
 static __always_inline bool has_fpu(void) { return false; }
 #define fstate_save(task, regs) do { } while (0)
 #define fstate_restore(task, regs) do { } while (0)
-#define __switch_to_aux(__prev, __next) do { } while (0)
+#define __switch_to_fpu(__prev, __next) do { } while (0)
 #endif
 
 extern struct task_struct *__switch_to(struct task_struct *,
@@ -76,7 +76,7 @@ do {							\
 	struct task_struct *__prev = (prev);		\
 	struct task_struct *__next = (next);		\
 	if (has_fpu())					\
-		__switch_to_aux(__prev, __next);	\
+		__switch_to_fpu(__prev, __next);	\
 	((last) = __switch_to(__prev, __next));		\
 } while (0)
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 03/21] riscv: Extending cpufeature.c to detect V-extension
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
  2021-09-08 17:45 ` [RFC PATCH v8 01/21] riscv: Separate patch for cflags and aflags Greentime Hu
  2021-09-08 17:45 ` [RFC PATCH v8 02/21] riscv: Rename __switch_to_aux -> fpu Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-08 17:45 ` [RFC PATCH v8 04/21] riscv: Add new csr defines related to vector extension Greentime Hu
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

From: Guo Ren <ren_guo@c-sky.com>

Current cpufeature.c doesn't support detecting V-extension, because
"rv64" also contain a 'v' letter and we need to skip it.

Signed-off-by: Guo Ren <ren_guo@c-sky.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
---
 arch/riscv/include/uapi/asm/hwcap.h | 1 +
 arch/riscv/kernel/cpufeature.c      | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/riscv/include/uapi/asm/hwcap.h b/arch/riscv/include/uapi/asm/hwcap.h
index 46dc3f5ee99f..c52bb7bbbabe 100644
--- a/arch/riscv/include/uapi/asm/hwcap.h
+++ b/arch/riscv/include/uapi/asm/hwcap.h
@@ -21,5 +21,6 @@
 #define COMPAT_HWCAP_ISA_F	(1 << ('F' - 'A'))
 #define COMPAT_HWCAP_ISA_D	(1 << ('D' - 'A'))
 #define COMPAT_HWCAP_ISA_C	(1 << ('C' - 'A'))
+#define COMPAT_HWCAP_ISA_V	(1 << ('V' - 'A'))
 
 #endif /* _UAPI_ASM_RISCV_HWCAP_H */
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index d959d207a40d..7069e55335d0 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -73,6 +73,7 @@ void __init riscv_fill_hwcap(void)
 	isa2hwcap['f'] = isa2hwcap['F'] = COMPAT_HWCAP_ISA_F;
 	isa2hwcap['d'] = isa2hwcap['D'] = COMPAT_HWCAP_ISA_D;
 	isa2hwcap['c'] = isa2hwcap['C'] = COMPAT_HWCAP_ISA_C;
+	isa2hwcap['v'] = isa2hwcap['V'] = COMPAT_HWCAP_ISA_V;
 
 	elf_hwcap = 0;
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 04/21] riscv: Add new csr defines related to vector extension
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (2 preceding siblings ...)
  2021-09-08 17:45 ` [RFC PATCH v8 03/21] riscv: Extending cpufeature.c to detect V-extension Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-08 17:45 ` [RFC PATCH v8 05/21] riscv: Add vector feature to compile Greentime Hu
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

Follow the riscv vector spec to add new csr numbers.

[guoren@linux.alibaba.com: first porting for new vector related csr]
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Acked-by: Guo Ren <guoren@kernel.org>
Co-developed-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
---
 arch/riscv/include/asm/csr.h | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 87ac65696871..069743102fac 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -24,6 +24,12 @@
 #define SR_FS_CLEAN	_AC(0x00004000, UL)
 #define SR_FS_DIRTY	_AC(0x00006000, UL)
 
+#define SR_VS           _AC(0x00000600, UL) /* Vector Status */
+#define SR_VS_OFF       _AC(0x00000000, UL)
+#define SR_VS_INITIAL   _AC(0x00000200, UL)
+#define SR_VS_CLEAN     _AC(0x00000400, UL)
+#define SR_VS_DIRTY     _AC(0x00000600, UL)
+
 #define SR_XS		_AC(0x00018000, UL) /* Extension Status */
 #define SR_XS_OFF	_AC(0x00000000, UL)
 #define SR_XS_INITIAL	_AC(0x00008000, UL)
@@ -31,9 +37,9 @@
 #define SR_XS_DIRTY	_AC(0x00018000, UL)
 
 #ifndef CONFIG_64BIT
-#define SR_SD		_AC(0x80000000, UL) /* FS/XS dirty */
+#define SR_SD		_AC(0x80000000, UL) /* FS/VS/XS dirty */
 #else
-#define SR_SD		_AC(0x8000000000000000, UL) /* FS/XS dirty */
+#define SR_SD		_AC(0x8000000000000000, UL) /* FS/VS/XS dirty */
 #endif
 
 /* SATP flags */
@@ -120,6 +126,12 @@
 #define CSR_MIMPID		0xf13
 #define CSR_MHARTID		0xf14
 
+#define CSR_VSTART		0x8
+#define CSR_VCSR		0xf
+#define CSR_VL			0xc20
+#define CSR_VTYPE		0xc21
+#define CSR_VLENB		0xc22
+
 #ifdef CONFIG_RISCV_M_MODE
 # define CSR_STATUS	CSR_MSTATUS
 # define CSR_IE		CSR_MIE
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 05/21] riscv: Add vector feature to compile
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (3 preceding siblings ...)
  2021-09-08 17:45 ` [RFC PATCH v8 04/21] riscv: Add new csr defines related to vector extension Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-08 17:45 ` [RFC PATCH v8 06/21] riscv: Add has_vector/riscv_vsize to save vector features Greentime Hu
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

From: Guo Ren <guoren@linux.alibaba.com>

This patch adds a new config option which could enable assembler's
vector feature.

Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
---
 arch/riscv/Kconfig  | 9 +++++++++
 arch/riscv/Makefile | 1 +
 2 files changed, 10 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 4f7b70ae7c31..619cfc370ee5 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -380,6 +380,15 @@ config FPU
 
 	  If you don't know what to do here, say Y.
 
+config VECTOR
+	bool "VECTOR support"
+	default n
+	help
+	  Say N here if you want to disable all vector related procedure
+	  in the kernel.
+
+	  If you don't know what to do here, say Y.
+
 endmenu
 
 menu "Kernel features"
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 428bd3bc202f..1450bdde5288 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -57,6 +57,7 @@ riscv-march-aflags-$(CONFIG_ARCH_RV32I)		:= rv32ima
 riscv-march-aflags-$(CONFIG_ARCH_RV64I)		:= rv64ima
 riscv-march-aflags-$(CONFIG_FPU)		:= $(riscv-march-aflags-y)fd
 riscv-march-aflags-$(CONFIG_RISCV_ISA_C)	:= $(riscv-march-aflags-y)c
+riscv-march-aflags-$(CONFIG_VECTOR)		:= $(riscv-march-aflags-y)v
 
 KBUILD_CFLAGS += -march=$(riscv-march-cflags-y)
 KBUILD_AFLAGS += -march=$(riscv-march-aflags-y)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 06/21] riscv: Add has_vector/riscv_vsize to save vector features.
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (4 preceding siblings ...)
  2021-09-08 17:45 ` [RFC PATCH v8 05/21] riscv: Add vector feature to compile Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-08 17:45 ` [RFC PATCH v8 07/21] riscv: Reset vector register Greentime Hu
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

This patch is used to detect vector support status of CPU and use
riscv_vsize to save the size of all the vector registers. It assumes
all harts has the same capabilities in SMP system.

[guoren@linux.alibaba.com: add has_vector checking]
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Co-developed-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
---
 arch/riscv/kernel/cpufeature.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 7069e55335d0..7265d947d981 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -21,6 +21,10 @@ static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly;
 #ifdef CONFIG_FPU
 __ro_after_init DEFINE_STATIC_KEY_FALSE(cpu_hwcap_fpu);
 #endif
+#ifdef CONFIG_VECTOR
+bool has_vector __read_mostly;
+unsigned long riscv_vsize __read_mostly;
+#endif
 
 /**
  * riscv_isa_extension_base() - Get base extension word
@@ -149,4 +153,12 @@ void __init riscv_fill_hwcap(void)
 	if (elf_hwcap & (COMPAT_HWCAP_ISA_F | COMPAT_HWCAP_ISA_D))
 		static_branch_enable(&cpu_hwcap_fpu);
 #endif
+
+#ifdef CONFIG_VECTOR
+	if (elf_hwcap & COMPAT_HWCAP_ISA_V) {
+		has_vector = true;
+		/* There are 32 vector registers with vlenb length. */
+		riscv_vsize = csr_read(CSR_VLENB) * 32;
+	}
+#endif
 }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 07/21] riscv: Reset vector register
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (5 preceding siblings ...)
  2021-09-08 17:45 ` [RFC PATCH v8 06/21] riscv: Add has_vector/riscv_vsize to save vector features Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-08 17:45 ` [RFC PATCH v8 08/21] riscv: Add vector struct and assembler definitions Greentime Hu
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

From: Guo Ren <guoren@linux.alibaba.com>

Reset vector registers at boot-time and disable vector instructions
execution for kernel mode.

[greentime.hu@sifive.com: add comments]
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Co-developed-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
---
 arch/riscv/kernel/entry.S |  6 ++---
 arch/riscv/kernel/head.S  | 49 +++++++++++++++++++++++++++++++++++++--
 2 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 98f502654edd..ad0fa80ada81 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -77,10 +77,10 @@ _save_context:
 	 * Disable user-mode memory access as it should only be set in the
 	 * actual user copy routines.
 	 *
-	 * Disable the FPU to detect illegal usage of floating point in kernel
-	 * space.
+	 * Disable the FPU/Vector to detect illegal usage of floating point
+	 * or vector in kernel space.
 	 */
-	li t0, SR_SUM | SR_FS
+	li t0, SR_SUM | SR_FS | SR_VS
 
 	REG_L s0, TASK_TI_USER_SP(tp)
 	csrrc s1, CSR_STATUS, t0
diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
index fce5184b22c3..cf331f138142 100644
--- a/arch/riscv/kernel/head.S
+++ b/arch/riscv/kernel/head.S
@@ -241,10 +241,10 @@ pmp_done:
 .option pop
 
 	/*
-	 * Disable FPU to detect illegal usage of
+	 * Disable FPU & VECTOR to detect illegal usage of
 	 * floating point in kernel space
 	 */
-	li t0, SR_FS
+	li t0, SR_FS | SR_VS
 	csrc CSR_STATUS, t0
 
 #ifdef CONFIG_SMP
@@ -432,6 +432,51 @@ ENTRY(reset_regs)
 	csrw	fcsr, 0
 	/* note that the caller must clear SR_FS */
 #endif /* CONFIG_FPU */
+
+#ifdef CONFIG_VECTOR
+	csrr	t0, CSR_MISA
+	li	t1, (COMPAT_HWCAP_ISA_V >> 16)
+	slli	t1, t1, 16
+	and	t0, t0, t1
+	beqz	t0, .Lreset_regs_done
+
+	li	t1, SR_VS
+	csrs	CSR_STATUS, t1
+	vmv.v.i v0, 0
+	vmv.v.i v1, 0
+	vmv.v.i v2, 0
+	vmv.v.i v3, 0
+	vmv.v.i v4, 0
+	vmv.v.i v5, 0
+	vmv.v.i v6, 0
+	vmv.v.i v7, 0
+	vmv.v.i v8, 0
+	vmv.v.i v9, 0
+	vmv.v.i v10, 0
+	vmv.v.i v11, 0
+	vmv.v.i v12, 0
+	vmv.v.i v13, 0
+	vmv.v.i v14, 0
+	vmv.v.i v15, 0
+	vmv.v.i v16, 0
+	vmv.v.i v17, 0
+	vmv.v.i v18, 0
+	vmv.v.i v19, 0
+	vmv.v.i v20, 0
+	vmv.v.i v21, 0
+	vmv.v.i v22, 0
+	vmv.v.i v23, 0
+	vmv.v.i v24, 0
+	vmv.v.i v25, 0
+	vmv.v.i v26, 0
+	vmv.v.i v27, 0
+	vmv.v.i v28, 0
+	vmv.v.i v29, 0
+	vmv.v.i v30, 0
+	vmv.v.i v31, 0
+	/* note that the caller must clear SR_VS */
+#endif /* CONFIG_VECTOR */
+
 .Lreset_regs_done:
 	ret
 END(reset_regs)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 08/21] riscv: Add vector struct and assembler definitions
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (6 preceding siblings ...)
  2021-09-08 17:45 ` [RFC PATCH v8 07/21] riscv: Reset vector register Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-08 17:45 ` [RFC PATCH v8 09/21] riscv: Add task switch support for vector Greentime Hu
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

Add vector state context struct in struct thread and asm-offsets.c
definitions.

The vector registers will be saved in datap pointer of __riscv_v_state. It
will be dynamically allocated in kernel space. It will be put right after
the __riscv_v_state data structure in user space.

Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
---
 arch/riscv/include/asm/processor.h   |  1 +
 arch/riscv/include/uapi/asm/ptrace.h | 11 +++++++++++
 arch/riscv/kernel/asm-offsets.c      |  6 ++++++
 3 files changed, 18 insertions(+)

diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 021ed64ee608..1b037c69d311 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -35,6 +35,7 @@ struct thread_struct {
 	unsigned long s[12];	/* s[0]: frame pointer */
 	struct __riscv_d_ext_state fstate;
 	unsigned long bad_cause;
+	struct __riscv_v_state vstate;
 };
 
 #define INIT_THREAD {					\
diff --git a/arch/riscv/include/uapi/asm/ptrace.h b/arch/riscv/include/uapi/asm/ptrace.h
index 882547f6bd5c..bd3b8a710246 100644
--- a/arch/riscv/include/uapi/asm/ptrace.h
+++ b/arch/riscv/include/uapi/asm/ptrace.h
@@ -77,6 +77,17 @@ union __riscv_fp_state {
 	struct __riscv_q_ext_state q;
 };
 
+struct __riscv_v_state {
+	unsigned long vstart;
+	unsigned long vl;
+	unsigned long vtype;
+	unsigned long vcsr;
+	void *datap;
+#if __riscv_xlen == 32
+	__u32 __padding;
+#endif
+};
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _UAPI_ASM_RISCV_PTRACE_H */
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index 90f8ce64fa6f..34f43c84723a 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -72,6 +72,12 @@ void asm_offsets(void)
 	OFFSET(TSK_STACK_CANARY, task_struct, stack_canary);
 #endif
 
+	OFFSET(RISCV_V_STATE_VSTART, __riscv_v_state, vstart);
+	OFFSET(RISCV_V_STATE_VL, __riscv_v_state, vl);
+	OFFSET(RISCV_V_STATE_VTYPE, __riscv_v_state, vtype);
+	OFFSET(RISCV_V_STATE_VCSR, __riscv_v_state, vcsr);
+	OFFSET(RISCV_V_STATE_DATAP, __riscv_v_state, datap);
+
 	DEFINE(PT_SIZE, sizeof(struct pt_regs));
 	OFFSET(PT_EPC, pt_regs, epc);
 	OFFSET(PT_RA, pt_regs, ra);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (7 preceding siblings ...)
  2021-09-08 17:45 ` [RFC PATCH v8 08/21] riscv: Add vector struct and assembler definitions Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-13 12:21   ` Darius Rad
  2021-09-08 17:45 ` [RFC PATCH v8 10/21] riscv: Add ptrace vector support Greentime Hu
                   ` (13 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

This patch adds task switch support for vector. It supports partial lazy
save and restore mechanism. It also supports all lengths of vlen.

[guoren@linux.alibaba.com: First available porting to support vector
context switching]
[nick.knight@sifive.com: Rewrite vector.S to support dynamic vlen, xlen and
code refine]
[vincent.chen@sifive.co: Fix the might_sleep issue in vstate_save,
vstate_restore]
Co-developed-by: Nick Knight <nick.knight@sifive.com>
Signed-off-by: Nick Knight <nick.knight@sifive.com>
Co-developed-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
---
 arch/riscv/include/asm/switch_to.h | 66 +++++++++++++++++++++++
 arch/riscv/kernel/Makefile         |  1 +
 arch/riscv/kernel/process.c        | 38 ++++++++++++++
 arch/riscv/kernel/vector.S         | 84 ++++++++++++++++++++++++++++++
 4 files changed, 189 insertions(+)
 create mode 100644 arch/riscv/kernel/vector.S

diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index ec83770b3d98..de0573dad78f 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -7,10 +7,12 @@
 #define _ASM_RISCV_SWITCH_TO_H
 
 #include <linux/jump_label.h>
+#include <linux/slab.h>
 #include <linux/sched/task_stack.h>
 #include <asm/processor.h>
 #include <asm/ptrace.h>
 #include <asm/csr.h>
+#include <asm/asm-offsets.h>
 
 #ifdef CONFIG_FPU
 extern void __fstate_save(struct task_struct *save_to);
@@ -68,6 +70,68 @@ static __always_inline bool has_fpu(void) { return false; }
 #define __switch_to_fpu(__prev, __next) do { } while (0)
 #endif
 
+#ifdef CONFIG_VECTOR
+extern bool has_vector;
+extern unsigned long riscv_vsize;
+extern void __vstate_save(struct __riscv_v_state *save_to, void *datap);
+extern void __vstate_restore(struct __riscv_v_state *restore_from, void *datap);
+
+static inline void __vstate_clean(struct pt_regs *regs)
+{
+	regs->status = (regs->status & ~(SR_VS)) | SR_VS_CLEAN;
+}
+
+static inline void vstate_off(struct task_struct *task,
+			      struct pt_regs *regs)
+{
+	regs->status = (regs->status & ~SR_VS) | SR_VS_OFF;
+}
+
+static inline void vstate_save(struct task_struct *task,
+			       struct pt_regs *regs)
+{
+	if ((regs->status & SR_VS) == SR_VS_DIRTY) {
+		struct __riscv_v_state *vstate = &(task->thread.vstate);
+
+		__vstate_save(vstate, vstate->datap);
+		__vstate_clean(regs);
+	}
+}
+
+static inline void vstate_restore(struct task_struct *task,
+				  struct pt_regs *regs)
+{
+	if ((regs->status & SR_VS) != SR_VS_OFF) {
+		struct __riscv_v_state *vstate = &(task->thread.vstate);
+
+		/* Allocate space for vector registers. */
+		if (!vstate->datap) {
+			vstate->datap = kzalloc(riscv_vsize, GFP_ATOMIC);
+			vstate->size = riscv_vsize;
+		}
+		__vstate_restore(vstate, vstate->datap);
+		__vstate_clean(regs);
+	}
+}
+
+static inline void __switch_to_vector(struct task_struct *prev,
+				   struct task_struct *next)
+{
+	struct pt_regs *regs;
+
+	regs = task_pt_regs(prev);
+	if (unlikely(regs->status & SR_SD))
+		vstate_save(prev, regs);
+	vstate_restore(next, task_pt_regs(next));
+}
+
+#else
+#define has_vector false
+#define vstate_save(task, regs) do { } while (0)
+#define vstate_restore(task, regs) do { } while (0)
+#define __switch_to_vector(__prev, __next) do { } while (0)
+#endif
+
 extern struct task_struct *__switch_to(struct task_struct *,
 				       struct task_struct *);
 
@@ -77,6 +141,8 @@ do {							\
 	struct task_struct *__next = (next);		\
 	if (has_fpu())					\
 		__switch_to_fpu(__prev, __next);	\
+	if (has_vector)					\
+		__switch_to_vector(__prev, __next);	\
 	((last) = __switch_to(__prev, __next));		\
 } while (0)
 
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 3397ddac1a30..344078080839 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -40,6 +40,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
 
 obj-$(CONFIG_RISCV_M_MODE)	+= traps_misaligned.o
 obj-$(CONFIG_FPU)		+= fpu.o
+obj-$(CONFIG_VECTOR)		+= vector.o
 obj-$(CONFIG_SMP)		+= smpboot.o
 obj-$(CONFIG_SMP)		+= smp.o
 obj-$(CONFIG_SMP)		+= cpu_ops.o
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 03ac3aa611f5..0b86e9e531c9 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -95,6 +95,16 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
 		 */
 		fstate_restore(current, regs);
 	}
+
+	if (has_vector) {
+		regs->status |= SR_VS_INITIAL;
+		/*
+		 * Restore the initial value to the vector register
+		 * before starting the user program.
+		 */
+		vstate_restore(current, regs);
+	}
+
 	regs->epc = pc;
 	regs->sp = sp;
 }
@@ -110,15 +120,43 @@ void flush_thread(void)
 	fstate_off(current, task_pt_regs(current));
 	memset(&current->thread.fstate, 0, sizeof(current->thread.fstate));
 #endif
+#ifdef CONFIG_VECTOR
+	/* Reset vector state */
+	vstate_off(current, task_pt_regs(current));
+	memset(&current->thread.vstate, 0, sizeof(current->thread.vstate));
+#endif
 }
 
 int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 {
 	fstate_save(src, task_pt_regs(src));
+	if (has_vector)
+		/* To make sure every dirty vector context is saved. */
+		vstate_save(src, task_pt_regs(src));
 	*dst = *src;
+	if (has_vector) {
+		/* Copy vector context to the forked task from parent. */
+		if ((task_pt_regs(src)->status & SR_VS) != SR_VS_OFF) {
+			dst->thread.vstate.datap = kzalloc(riscv_vsize, GFP_KERNEL);
+			/* Failed to allocate memory. */
+			if (!dst->thread.vstate.datap)
+				return -ENOMEM;
+			/* Copy the src vector context to dst. */
+			memcpy(dst->thread.vstate.datap,
+			       src->thread.vstate.datap, riscv_vsize);
+		}
+	}
+
 	return 0;
 }
 
+void arch_release_task_struct(struct task_struct *tsk)
+{
+	/* Free the vector context of datap. */
+	if (has_vector)
+		kfree(tsk->thread.vstate.datap);
+}
+
 int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
 		struct task_struct *p, unsigned long tls)
 {
diff --git a/arch/riscv/kernel/vector.S b/arch/riscv/kernel/vector.S
new file mode 100644
index 000000000000..4c880b1c32aa
--- /dev/null
+++ b/arch/riscv/kernel/vector.S
@@ -0,0 +1,84 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2012 Regents of the University of California
+ * Copyright (C) 2017 SiFive
+ * Copyright (C) 2019 Alibaba Group Holding Limited
+ *
+ *   This program is free software; you can redistribute it and/or
+ *   modify it under the terms of the GNU General Public License
+ *   as published by the Free Software Foundation, version 2.
+ *
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *   GNU General Public License for more details.
+ */
+
+#include <linux/linkage.h>
+
+#include <asm/asm.h>
+#include <asm/csr.h>
+#include <asm/asm-offsets.h>
+
+#define vstatep  a0
+#define datap    a1
+#define x_vstart t0
+#define x_vtype  t1
+#define x_vl     t2
+#define x_vcsr   t3
+#define incr     t4
+#define m_one    t5
+#define status   t6
+
+ENTRY(__vstate_save)
+	li      status, SR_VS
+	csrs    sstatus, status
+
+	csrr    x_vstart, CSR_VSTART
+	csrr    x_vtype, CSR_VTYPE
+	csrr    x_vl, CSR_VL
+	csrr    x_vcsr, CSR_VCSR
+	li      m_one, -1
+	vsetvli incr, m_one, e8, m8
+	vse8.v   v0, (datap)
+	add     datap, datap, incr
+	vse8.v   v8, (datap)
+	add     datap, datap, incr
+	vse8.v   v16, (datap)
+	add     datap, datap, incr
+	vse8.v   v24, (datap)
+
+	REG_S   x_vstart, RISCV_V_STATE_VSTART(vstatep)
+	REG_S   x_vtype, RISCV_V_STATE_VTYPE(vstatep)
+	REG_S   x_vl, RISCV_V_STATE_VL(vstatep)
+	REG_S   x_vcsr, RISCV_V_STATE_VCSR(vstatep)
+
+	csrc	sstatus, status
+	ret
+ENDPROC(__vstate_save)
+
+ENTRY(__vstate_restore)
+	li      status, SR_VS
+	csrs    sstatus, status
+
+	li      m_one, -1
+	vsetvli incr, m_one, e8, m8
+	vle8.v   v0, (datap)
+	add     datap, datap, incr
+	vle8.v   v8, (datap)
+	add     datap, datap, incr
+	vle8.v   v16, (datap)
+	add     datap, datap, incr
+	vle8.v   v24, (datap)
+
+	REG_L   x_vstart, RISCV_V_STATE_VSTART(vstatep)
+	REG_L   x_vtype, RISCV_V_STATE_VTYPE(vstatep)
+	REG_L   x_vl, RISCV_V_STATE_VL(vstatep)
+	REG_L   x_vcsr, RISCV_V_STATE_VCSR(vstatep)
+	vsetvl  x0, x_vl, x_vtype
+	csrw    CSR_VSTART, x_vstart
+	csrw    CSR_VCSR, x_vcsr
+
+	csrc	sstatus, status
+	ret
+ENDPROC(__vstate_restore)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 10/21] riscv: Add ptrace vector support
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (8 preceding siblings ...)
  2021-09-08 17:45 ` [RFC PATCH v8 09/21] riscv: Add task switch support for vector Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-08 17:45 ` [RFC PATCH v8 11/21] riscv: Add sigcontext save/restore for vector Greentime Hu
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

This patch adds ptrace support for riscv vector. The vector registers will
be saved in datap pointer of __riscv_v_state. This pointer will be set
right after the __riscv_v_state data structure then it will be put in ubuf
for ptrace system call to get or set. It will check if the datap got from
ubuf is set to the correct address or not when the ptrace system call is
trying to set the vector registers.

Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
---
 arch/riscv/include/uapi/asm/ptrace.h | 14 ++++++
 arch/riscv/kernel/ptrace.c           | 71 ++++++++++++++++++++++++++++
 include/uapi/linux/elf.h             |  1 +
 3 files changed, 86 insertions(+)

diff --git a/arch/riscv/include/uapi/asm/ptrace.h b/arch/riscv/include/uapi/asm/ptrace.h
index bd3b8a710246..c3760395236c 100644
--- a/arch/riscv/include/uapi/asm/ptrace.h
+++ b/arch/riscv/include/uapi/asm/ptrace.h
@@ -83,11 +83,25 @@ struct __riscv_v_state {
 	unsigned long vtype;
 	unsigned long vcsr;
 	void *datap;
+	/*
+	 * In signal handler, datap will be set a correct user stack offset
+	 * and vector registers will be copied to the address of datap
+	 * pointer.
+	 *
+	 * In ptrace syscall, datap will be set to zero and the vector
+	 * registers will be copied to the address right after this
+	 * structure.
+	 */
 #if __riscv_xlen == 32
 	__u32 __padding;
 #endif
 };
 
+/*
+ * To define a practical maximum vlenb for ptrace and it may need to be
+ * extended someday.
+ */
+#define RISCV_MAX_VLENB (16384)
 #endif /* __ASSEMBLY__ */
 
 #endif /* _UAPI_ASM_RISCV_PTRACE_H */
diff --git a/arch/riscv/kernel/ptrace.c b/arch/riscv/kernel/ptrace.c
index 9c0511119bad..0bc11a70090c 100644
--- a/arch/riscv/kernel/ptrace.c
+++ b/arch/riscv/kernel/ptrace.c
@@ -27,6 +27,9 @@ enum riscv_regset {
 #ifdef CONFIG_FPU
 	REGSET_F,
 #endif
+#ifdef CONFIG_VECTOR
+	REGSET_V,
+#endif
 };
 
 static int riscv_gpr_get(struct task_struct *target,
@@ -85,6 +88,64 @@ static int riscv_fpr_set(struct task_struct *target,
 }
 #endif
 
+#ifdef CONFIG_VECTOR
+static int riscv_vr_get(struct task_struct *target,
+			const struct user_regset *regset,
+			struct membuf to)
+{
+	struct __riscv_v_state *vstate = &target->thread.vstate;
+
+	/*
+	 * Ensure the vector registers have been saved to the memory before
+	 * copying them to membuf.
+	 */
+	if (target == current)
+		vstate_save(current, task_pt_regs(current));
+
+	/* Copy vector header from vstate. */
+	membuf_write(&to, vstate, RISCV_V_STATE_DATAP);
+	membuf_zero(&to, sizeof(void *));
+#if __riscv_xlen == 32
+	membuf_zero(&to, sizeof(__u32));
+#endif
+
+	/* Copy all the vector registers from vstate. */
+	return membuf_write(&to, vstate->datap, riscv_vsize);
+}
+
+static int riscv_vr_set(struct task_struct *target,
+			 const struct user_regset *regset,
+			 unsigned int pos, unsigned int count,
+			 const void *kbuf, const void __user *ubuf)
+{
+	int ret, size;
+	struct __riscv_v_state *vstate = &target->thread.vstate;
+
+	/* Copy rest of the vstate except datap and __padding. */
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, vstate, 0,
+				 RISCV_V_STATE_DATAP);
+	if (unlikely(ret))
+		return ret;
+
+	/* Skip copy datap. */
+	size = sizeof(vstate->datap);
+	count -= size;
+	ubuf += size;
+#if __riscv_xlen == 32
+	/* Skip copy _padding. */
+	size = sizeof(vstate->__padding);
+	count -= size;
+	ubuf += size;
+#endif
+
+	/* Copy all the vector registers. */
+	pos = 0;
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, vstate->datap,
+				 0, riscv_vsize);
+	return ret;
+}
+#endif
+
 static const struct user_regset riscv_user_regset[] = {
 	[REGSET_X] = {
 		.core_note_type = NT_PRSTATUS,
@@ -104,6 +165,16 @@ static const struct user_regset riscv_user_regset[] = {
 		.set = riscv_fpr_set,
 	},
 #endif
+#ifdef CONFIG_VECTOR
+	[REGSET_V] = {
+		.core_note_type = NT_RISCV_VECTOR,
+		.align = 16,
+		.n = (32 * RISCV_MAX_VLENB)/sizeof(__u32),
+		.size = sizeof(__u32),
+		.regset_get = riscv_vr_get,
+		.set = riscv_vr_set,
+	},
+#endif
 };
 
 static const struct user_regset_view riscv_user_native_view = {
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 61bf4774b8f2..60c5b873a8f6 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -432,6 +432,7 @@ typedef struct elf64_shdr {
 #define NT_MIPS_DSP	0x800		/* MIPS DSP ASE registers */
 #define NT_MIPS_FP_MODE	0x801		/* MIPS floating-point mode */
 #define NT_MIPS_MSA	0x802		/* MIPS SIMD registers */
+#define NT_RISCV_VECTOR	0x900		/* RISC-V vector registers */
 
 /* Note types with note name "GNU" */
 #define NT_GNU_PROPERTY_TYPE_0	5
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 11/21] riscv: Add sigcontext save/restore for vector
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (9 preceding siblings ...)
  2021-09-08 17:45 ` [RFC PATCH v8 10/21] riscv: Add ptrace vector support Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-30  2:37   ` Ley Foon Tan
  2021-09-08 17:45 ` [RFC PATCH v8 12/21] riscv: signal: Report signal frame size to userspace via auxv Greentime Hu
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

This patch adds sigcontext save/restore for vector. The vector registers
will be saved in datap pointer. The datap pointer will be allocated
dynamically when the task needs in kernel space. The datap pointer will
be set right after the __riscv_v_state data structure to save all the
vector registers in the signal handler stack.

Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
---
 arch/riscv/include/uapi/asm/sigcontext.h |  24 ++++
 arch/riscv/kernel/asm-offsets.c          |   2 +
 arch/riscv/kernel/setup.c                |   4 +
 arch/riscv/kernel/signal.c               | 164 ++++++++++++++++++++++-
 4 files changed, 190 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/include/uapi/asm/sigcontext.h b/arch/riscv/include/uapi/asm/sigcontext.h
index 84f2dfcfdbce..b8a0fd7d7cfc 100644
--- a/arch/riscv/include/uapi/asm/sigcontext.h
+++ b/arch/riscv/include/uapi/asm/sigcontext.h
@@ -8,6 +8,23 @@
 
 #include <asm/ptrace.h>
 
+/* The Magic number for signal context frame header. */
+#define RVV_MAGIC	0x53465457
+#define END_MAGIC	0x0
+
+/* The size of END signal context header. */
+#define END_HDR_SIZE	0x0
+
+struct __riscv_ctx_hdr {
+	__u32 magic;
+	__u32 size;
+};
+
+struct __sc_riscv_v_state {
+	struct __riscv_ctx_hdr head;
+	struct __riscv_v_state v_state;
+} __attribute__((aligned(16)));
+
 /*
  * Signal context structure
  *
@@ -17,6 +34,13 @@
 struct sigcontext {
 	struct user_regs_struct sc_regs;
 	union __riscv_fp_state sc_fpregs;
+	/*
+	 * 4K + 128 reserved for vector state and future expansion.
+	 * This space is enough to store the vector context whose VLENB
+	 * is less or equal to 128.
+	 * (The size of the vector context is 4144 byte as VLENB is 128)
+	 */
+	__u8 __reserved[4224] __attribute__((__aligned__(16)));
 };
 
 #endif /* _UAPI_ASM_RISCV_SIGCONTEXT_H */
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index 34f43c84723a..62a766d54540 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -72,6 +72,8 @@ void asm_offsets(void)
 	OFFSET(TSK_STACK_CANARY, task_struct, stack_canary);
 #endif
 
+	OFFSET(RISCV_V_STATE_MAGIC, __riscv_ctx_hdr, magic);
+	OFFSET(RISCV_V_STATE_SIZE, __riscv_ctx_hdr, size);
 	OFFSET(RISCV_V_STATE_VSTART, __riscv_v_state, vstart);
 	OFFSET(RISCV_V_STATE_VL, __riscv_v_state, vl);
 	OFFSET(RISCV_V_STATE_VTYPE, __riscv_v_state, vtype);
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 120b2f6f71bc..6f489f7e6246 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -260,6 +260,8 @@ static void __init parse_dtb(void)
 #endif
 }
 
+extern void __init init_rt_signal_env(void);
+
 void __init setup_arch(char **cmdline_p)
 {
 	parse_dtb();
@@ -295,6 +297,8 @@ void __init setup_arch(char **cmdline_p)
 #endif
 
 	riscv_fill_hwcap();
+
+	init_rt_signal_env();
 }
 
 static int __init topology_init(void)
diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
index c2d5ecbe5526..6938cfa16b45 100644
--- a/arch/riscv/kernel/signal.c
+++ b/arch/riscv/kernel/signal.c
@@ -18,15 +18,16 @@
 #include <asm/csr.h>
 
 extern u32 __user_rt_sigreturn[2];
+static size_t rvv_sc_size;
 
 #define DEBUG_SIG 0
 
 struct rt_sigframe {
 	struct siginfo info;
-	struct ucontext uc;
 #ifndef CONFIG_MMU
 	u32 sigreturn_code[2];
 #endif
+	struct ucontext uc;
 };
 
 #ifdef CONFIG_FPU
@@ -83,16 +84,154 @@ static long save_fp_state(struct pt_regs *regs,
 #define restore_fp_state(task, regs) (0)
 #endif
 
+#ifdef CONFIG_VECTOR
+static long restore_v_state(struct pt_regs *regs, void **sc_reserved_ptr)
+{
+	long err;
+	struct __sc_riscv_v_state __user *state = (struct __sc_riscv_v_state *)(*sc_reserved_ptr);
+	void *datap;
+	__u32 magic;
+	__u32 size;
+
+	/* Get magic number and check it. */
+	err = __get_user(magic, &state->head.magic);
+	err = __get_user(size, &state->head.size);
+	if (unlikely(err))
+		return err;
+
+	if (magic != RVV_MAGIC || size != rvv_sc_size)
+		return -EINVAL;
+
+	/* Copy everything of __sc_riscv_v_state except datap. */
+	err = __copy_from_user(&current->thread.vstate, &state->v_state,
+			       RISCV_V_STATE_DATAP);
+	if (unlikely(err))
+		return err;
+
+	/* Copy the pointer datap itself. */
+	err = __get_user(datap, &state->v_state.datap);
+	if (unlikely(err))
+		return err;
+
+
+	/* Copy the whole vector content from user space datap. */
+	err = __copy_from_user(current->thread.vstate.datap, datap, riscv_vsize);
+	if (unlikely(err))
+		return err;
+
+	vstate_restore(current, regs);
+
+	/* Move sc_reserved_ptr to point the next signal context frame. */
+	*sc_reserved_ptr += size;
+
+	return err;
+}
+
+static long save_v_state(struct pt_regs *regs, void **sc_reserved_free_ptr)
+{
+	/*
+	 * Put __sc_riscv_v_state to the user's signal context space pointed
+	 * by sc_reserved_free_ptr and the datap point the address right
+	 * after __sc_riscv_v_state.
+	 */
+	struct __sc_riscv_v_state __user *state = (struct __sc_riscv_v_state *)
+		(*sc_reserved_free_ptr);
+	void *datap = state + 1;
+	long err;
+
+	*sc_reserved_free_ptr += rvv_sc_size;
+
+	err = __put_user(RVV_MAGIC, &state->head.magic);
+	err = __put_user(rvv_sc_size, &state->head.size);
+
+	vstate_save(current, regs);
+	/* Copy everything of vstate but datap. */
+	err = __copy_to_user(&state->v_state, &current->thread.vstate,
+			     RISCV_V_STATE_DATAP);
+	if (unlikely(err))
+		return err;
+
+	/* Copy the pointer datap itself. */
+	err = __put_user(datap, &state->v_state.datap);
+	if (unlikely(err))
+		return err;
+
+	/* Copy the whole vector content to user space datap. */
+	err = __copy_to_user(datap, current->thread.vstate.datap, riscv_vsize);
+
+	return err;
+}
+#else
+#define save_v_state(task, regs) (0)
+#define restore_v_state(task, regs) (0)
+#endif
+
 static long restore_sigcontext(struct pt_regs *regs,
 	struct sigcontext __user *sc)
 {
 	long err;
+	void *sc_reserved_ptr = sc->__reserved;
 	/* sc_regs is structured the same as the start of pt_regs */
 	err = __copy_from_user(regs, &sc->sc_regs, sizeof(sc->sc_regs));
 	/* Restore the floating-point state. */
 	if (has_fpu())
 		err |= restore_fp_state(regs, &sc->sc_fpregs);
+
+	while (1 && !err) {
+		__u32 magic, size;
+		struct __riscv_ctx_hdr *head = (struct __riscv_ctx_hdr *)sc_reserved_ptr;
+
+		err |= __get_user(magic, &head->magic);
+		err |= __get_user(size, &head->size);
+		if (err)
+			goto done;
+
+		switch (magic) {
+		case 0:
+			if (size)
+				goto invalid;
+			goto done;
+		case RVV_MAGIC:
+			if (!has_vector)
+				goto invalid;
+			if (size != rvv_sc_size)
+				goto invalid;
+			err |= restore_v_state(regs, &sc_reserved_ptr);
+			break;
+		default:
+			goto invalid;
+		}
+	}
+done:
 	return err;
+
+invalid:
+	return -EINVAL;
+}
+
+static size_t cal_rt_frame_size(void)
+{
+	struct rt_sigframe __user *frame;
+	static size_t frame_size;
+	size_t total_context_size = 0;
+	size_t sc_reserved_size = sizeof(frame->uc.uc_mcontext.__reserved);
+
+	if (frame_size)
+		goto done;
+
+	frame_size = sizeof(*frame);
+
+	if (has_vector)
+		total_context_size += rvv_sc_size;
+	/* Preserved a __riscv_ctx_hdr for END signal context header. */
+	total_context_size += sizeof(struct __riscv_ctx_hdr);
+
+	if (total_context_size > sc_reserved_size)
+		frame_size += (total_context_size - sc_reserved_size);
+
+done:
+	return round_up(frame_size, 16);
+
 }
 
 SYSCALL_DEFINE0(rt_sigreturn)
@@ -101,13 +240,14 @@ SYSCALL_DEFINE0(rt_sigreturn)
 	struct rt_sigframe __user *frame;
 	struct task_struct *task;
 	sigset_t set;
+	size_t frame_size = cal_rt_frame_size();
 
 	/* Always make any pending restarted system calls return -EINTR */
 	current->restart_block.fn = do_no_restart_syscall;
 
 	frame = (struct rt_sigframe __user *)regs->sp;
 
-	if (!access_ok(frame, sizeof(*frame)))
+	if (!access_ok(frame, frame_size))
 		goto badframe;
 
 	if (__copy_from_user(&set, &frame->uc.uc_sigmask, sizeof(set)))
@@ -140,11 +280,20 @@ static long setup_sigcontext(struct rt_sigframe __user *frame,
 {
 	struct sigcontext __user *sc = &frame->uc.uc_mcontext;
 	long err;
+	void *sc_reserved_free_ptr = sc->__reserved;
+
 	/* sc_regs is structured the same as the start of pt_regs */
 	err = __copy_to_user(&sc->sc_regs, regs, sizeof(sc->sc_regs));
 	/* Save the floating-point state. */
 	if (has_fpu())
 		err |= save_fp_state(regs, &sc->sc_fpregs);
+	/* Save the vector state. */
+	if (has_vector)
+		err |= save_v_state(regs, &sc_reserved_free_ptr);
+
+	/* Put END __riscv_ctx_hdr at the end. */
+	err = __put_user(END_MAGIC, &((struct __riscv_ctx_hdr *)sc_reserved_free_ptr)->magic);
+	err = __put_user(END_HDR_SIZE, &((struct __riscv_ctx_hdr *)sc_reserved_free_ptr)->size);
 	return err;
 }
 
@@ -176,9 +325,10 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
 {
 	struct rt_sigframe __user *frame;
 	long err = 0;
+	size_t frame_size = cal_rt_frame_size();
 
-	frame = get_sigframe(ksig, regs, sizeof(*frame));
-	if (!access_ok(frame, sizeof(*frame)))
+	frame = get_sigframe(ksig, regs, frame_size);
+	if (!access_ok(frame, frame_size))
 		return -EFAULT;
 
 	err |= copy_siginfo_to_user(&frame->info, &ksig->info);
@@ -319,3 +469,9 @@ asmlinkage __visible void do_notify_resume(struct pt_regs *regs,
 	if (thread_info_flags & _TIF_NOTIFY_RESUME)
 		tracehook_notify_resume(regs);
 }
+
+void init_rt_signal_env(void);
+void __init init_rt_signal_env(void)
+{
+	rvv_sc_size = sizeof(struct __sc_riscv_v_state) + riscv_vsize;
+}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 12/21] riscv: signal: Report signal frame size to userspace via auxv
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (10 preceding siblings ...)
  2021-09-08 17:45 ` [RFC PATCH v8 11/21] riscv: Add sigcontext save/restore for vector Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-08 17:45 ` [RFC PATCH v8 13/21] riscv: Add support for kernel mode vector Greentime Hu
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

From: Vincent Chen <vincent.chen@sifive.com>

The vector register belongs to the signal context. They need to be stored
and restored as entering and leaving the signal handler. According to the
V-extension specification, the maximum length of the vector registers can
be 2^(XLEN-1). Hence, if userspace refers to the MINSIGSTKSZ to create a
sigframe, it may not be enough. To resolve this problem, this patch refers
to the commit 94b07c1f8c39c
("arm64: signal: Report signal frame size to userspace via auxv") to enable
userspace to know the minimum required sigframe size through the auxiliary
vector and use it to allocate enough memory for signal context.

Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
---
 arch/riscv/include/asm/elf.h         | 41 +++++++++++++++++-----------
 arch/riscv/include/asm/processor.h   |  2 ++
 arch/riscv/include/uapi/asm/auxvec.h |  1 +
 arch/riscv/kernel/signal.c           |  8 ++++++
 4 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h
index f4b490cd0e5d..1102052aa593 100644
--- a/arch/riscv/include/asm/elf.h
+++ b/arch/riscv/include/asm/elf.h
@@ -58,22 +58,31 @@ extern unsigned long elf_hwcap;
 #define ELF_PLATFORM	(NULL)
 
 #ifdef CONFIG_MMU
-#define ARCH_DLINFO						\
-do {								\
-	NEW_AUX_ENT(AT_SYSINFO_EHDR,				\
-		(elf_addr_t)current->mm->context.vdso);		\
-	NEW_AUX_ENT(AT_L1I_CACHESIZE,				\
-		get_cache_size(1, CACHE_TYPE_INST));		\
-	NEW_AUX_ENT(AT_L1I_CACHEGEOMETRY,			\
-		get_cache_geometry(1, CACHE_TYPE_INST));	\
-	NEW_AUX_ENT(AT_L1D_CACHESIZE,				\
-		get_cache_size(1, CACHE_TYPE_DATA));		\
-	NEW_AUX_ENT(AT_L1D_CACHEGEOMETRY,			\
-		get_cache_geometry(1, CACHE_TYPE_DATA));	\
-	NEW_AUX_ENT(AT_L2_CACHESIZE,				\
-		get_cache_size(2, CACHE_TYPE_UNIFIED));		\
-	NEW_AUX_ENT(AT_L2_CACHEGEOMETRY,			\
-		get_cache_geometry(2, CACHE_TYPE_UNIFIED));	\
+#define ARCH_DLINFO						 \
+do {								 \
+	NEW_AUX_ENT(AT_SYSINFO_EHDR,				 \
+		(elf_addr_t)current->mm->context.vdso);		 \
+	NEW_AUX_ENT(AT_L1I_CACHESIZE,				 \
+		get_cache_size(1, CACHE_TYPE_INST));		 \
+	NEW_AUX_ENT(AT_L1I_CACHEGEOMETRY,			 \
+		get_cache_geometry(1, CACHE_TYPE_INST));	 \
+	NEW_AUX_ENT(AT_L1D_CACHESIZE,				 \
+		get_cache_size(1, CACHE_TYPE_DATA));		 \
+	NEW_AUX_ENT(AT_L1D_CACHEGEOMETRY,			 \
+		get_cache_geometry(1, CACHE_TYPE_DATA));	 \
+	NEW_AUX_ENT(AT_L2_CACHESIZE,				 \
+		get_cache_size(2, CACHE_TYPE_UNIFIED));		 \
+	NEW_AUX_ENT(AT_L2_CACHEGEOMETRY,			 \
+		get_cache_geometry(2, CACHE_TYPE_UNIFIED));	 \
+	/*							 \
+	 * Should always be nonzero unless there's a kernel bug. \
+	 * If we haven't determined a sensible value to give to	 \
+	 * userspace, omit the entry:				 \
+	 */							 \
+	if (likely(signal_minsigstksz))				 \
+		NEW_AUX_ENT(AT_MINSIGSTKSZ, signal_minsigstksz); \
+	else							 \
+		NEW_AUX_ENT(AT_IGNORE, 0);			 \
 } while (0)
 #define ARCH_HAS_SETUP_ADDITIONAL_PAGES
 struct linux_binprm;
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 1b037c69d311..62c75645c606 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -7,6 +7,7 @@
 #define _ASM_RISCV_PROCESSOR_H
 
 #include <linux/const.h>
+#include <linux/cache.h>
 
 #include <vdso/processor.h>
 
@@ -74,6 +75,7 @@ int riscv_of_parent_hartid(struct device_node *node);
 extern void riscv_fill_hwcap(void);
 extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src);
 
+extern unsigned long signal_minsigstksz __ro_after_init;
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_RISCV_PROCESSOR_H */
diff --git a/arch/riscv/include/uapi/asm/auxvec.h b/arch/riscv/include/uapi/asm/auxvec.h
index 32c73ba1d531..6610d24e6662 100644
--- a/arch/riscv/include/uapi/asm/auxvec.h
+++ b/arch/riscv/include/uapi/asm/auxvec.h
@@ -33,5 +33,6 @@
 
 /* entries in ARCH_DLINFO */
 #define AT_VECTOR_SIZE_ARCH	7
+#define AT_MINSIGSTKSZ 51
 
 #endif /* _UAPI_ASM_RISCV_AUXVEC_H */
diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
index 6938cfa16b45..d30a3b588156 100644
--- a/arch/riscv/kernel/signal.c
+++ b/arch/riscv/kernel/signal.c
@@ -470,8 +470,16 @@ asmlinkage __visible void do_notify_resume(struct pt_regs *regs,
 		tracehook_notify_resume(regs);
 }
 
+unsigned long __ro_after_init signal_minsigstksz;
+
 void init_rt_signal_env(void);
 void __init init_rt_signal_env(void)
 {
 	rvv_sc_size = sizeof(struct __sc_riscv_v_state) + riscv_vsize;
+	/*
+	 * Determine the stack space required for guaranteed signal delivery.
+	 * The signal_minsigstksz will be populated into the AT_MINSIGSTKSZ entry
+	 * in the auxiliary array at process startup.
+	 */
+	signal_minsigstksz = cal_rt_frame_size();
 }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 13/21] riscv: Add support for kernel mode vector
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (11 preceding siblings ...)
  2021-09-08 17:45 ` [RFC PATCH v8 12/21] riscv: signal: Report signal frame size to userspace via auxv Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-09  6:17   ` Christoph Hellwig
  2021-09-08 17:45 ` [RFC PATCH v8 14/21] riscv: Use CSR_STATUS to replace sstatus in vector.S Greentime Hu
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

Add <asm/vector.h> containing kernel_rvv_begin()/kernel_rvv_end() function
declarations and corresponding definitions in kernel_mode_vector.c

These are needed to wrap uses of vector in kernel mode.

Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
---
 arch/riscv/include/asm/vector.h        |  14 ++
 arch/riscv/kernel/Makefile             |   6 +
 arch/riscv/kernel/kernel_mode_vector.c | 184 +++++++++++++++++++++++++
 3 files changed, 204 insertions(+)
 create mode 100644 arch/riscv/include/asm/vector.h
 create mode 100644 arch/riscv/kernel/kernel_mode_vector.c

diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
new file mode 100644
index 000000000000..5d7f14453f68
--- /dev/null
+++ b/arch/riscv/include/asm/vector.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2020 SiFive
+ */
+
+#ifndef __ASM_RISCV_VECTOR_H
+#define __ASM_RISCV_VECTOR_H
+
+#include <linux/types.h>
+
+void kernel_rvv_begin(void);
+void kernel_rvv_end(void);
+
+#endif /* ! __ASM_RISCV_VECTOR_H */
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 344078080839..a2efd3646cd8 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -41,6 +41,12 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
 obj-$(CONFIG_RISCV_M_MODE)	+= traps_misaligned.o
 obj-$(CONFIG_FPU)		+= fpu.o
 obj-$(CONFIG_VECTOR)		+= vector.o
+obj-$(CONFIG_VECTOR)		+= kernel_mode_vector.o
+riscv-march-cflags-$(CONFIG_ARCH_RV32I)		:= rv32ima
+riscv-march-cflags-$(CONFIG_ARCH_RV64I)		:= rv64ima
+riscv-march-cflags-$(CONFIG_RISCV_ISA_C)	:= $(riscv-march-cflags-y)c
+riscv-march-cflags-$(CONFIG_VECTOR)		:= $(riscv-march-cflags-y)v
+CFLAGS_kernel_mode_vector.o	+= -march=$(riscv-march-cflags-y)
 obj-$(CONFIG_SMP)		+= smpboot.o
 obj-$(CONFIG_SMP)		+= smp.o
 obj-$(CONFIG_SMP)		+= cpu_ops.o
diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
new file mode 100644
index 000000000000..108cfafe7496
--- /dev/null
+++ b/arch/riscv/kernel/kernel_mode_vector.c
@@ -0,0 +1,184 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ * Author: Catalin Marinas <catalin.marinas@arm.com>
+ * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2020 SiFive
+ */
+#include <linux/compiler.h>
+#include <linux/irqflags.h>
+#include <linux/percpu.h>
+#include <linux/preempt.h>
+#include <linux/types.h>
+
+#include <asm/vector.h>
+#include <asm/switch_to.h>
+
+DECLARE_PER_CPU(bool, vector_context_busy);
+DEFINE_PER_CPU(bool, vector_context_busy);
+
+/*
+ * may_use_vector - whether it is allowable at this time to issue vector
+ *                instructions or access the vector register file
+ *
+ * Callers must not assume that the result remains true beyond the next
+ * preempt_enable() or return from softirq context.
+ */
+static __must_check inline bool may_use_vector(void)
+{
+	/*
+	 * vector_context_busy is only set while preemption is disabled,
+	 * and is clear whenever preemption is enabled. Since
+	 * this_cpu_read() is atomic w.r.t. preemption, vector_context_busy
+	 * cannot change under our feet -- if it's set we cannot be
+	 * migrated, and if it's clear we cannot be migrated to a CPU
+	 * where it is set.
+	 */
+	return !in_irq() && !irqs_disabled() && !in_nmi() &&
+	       !this_cpu_read(vector_context_busy);
+}
+
+
+
+/*
+ * Claim ownership of the CPU vector context for use by the calling context.
+ *
+ * The caller may freely manipulate the vector context metadata until
+ * put_cpu_vector_context() is called.
+ */
+static void get_cpu_vector_context(void)
+{
+	bool busy;
+
+	preempt_disable();
+	busy = __this_cpu_xchg(vector_context_busy, true);
+
+	WARN_ON(busy);
+}
+
+/*
+ * Release the CPU vector context.
+ *
+ * Must be called from a context in which get_cpu_vector_context() was
+ * previously called, with no call to put_cpu_vector_context() in the
+ * meantime.
+ */
+static void put_cpu_vector_context(void)
+{
+	bool busy = __this_cpu_xchg(vector_context_busy, false);
+
+	WARN_ON(!busy);
+	preempt_enable();
+}
+
+static void rvv_enable(void)
+{
+	csr_set(CSR_STATUS, SR_VS);
+}
+
+static void rvv_disable(void)
+{
+	csr_clear(CSR_STATUS, SR_VS);
+}
+
+static void vector_flush_cpu_state(void)
+{
+	long tmp;
+
+	__asm__ __volatile__ (
+		"vsetvli %0, x0, e8, m1\n"
+		"vmv.v.i v0, 0\n"
+		"vmv.v.i v1, 0\n"
+		"vmv.v.i v2, 0\n"
+		"vmv.v.i v3, 0\n"
+		"vmv.v.i v4, 0\n"
+		"vmv.v.i v5, 0\n"
+		"vmv.v.i v6, 0\n"
+		"vmv.v.i v7, 0\n"
+		"vmv.v.i v8, 0\n"
+		"vmv.v.i v9, 0\n"
+		"vmv.v.i v10, 0\n"
+		"vmv.v.i v11, 0\n"
+		"vmv.v.i v12, 0\n"
+		"vmv.v.i v13, 0\n"
+		"vmv.v.i v14, 0\n"
+		"vmv.v.i v15, 0\n"
+		"vmv.v.i v16, 0\n"
+		"vmv.v.i v17, 0\n"
+		"vmv.v.i v18, 0\n"
+		"vmv.v.i v19, 0\n"
+		"vmv.v.i v20, 0\n"
+		"vmv.v.i v21, 0\n"
+		"vmv.v.i v22, 0\n"
+		"vmv.v.i v23, 0\n"
+		"vmv.v.i v24, 0\n"
+		"vmv.v.i v25, 0\n"
+		"vmv.v.i v26, 0\n"
+		"vmv.v.i v27, 0\n"
+		"vmv.v.i v28, 0\n"
+		"vmv.v.i v29, 0\n"
+		"vmv.v.i v30, 0\n"
+		"vmv.v.i v31, 0\n":"=r"(tmp)::);
+}
+
+/*
+ * kernel_rvv_begin(): obtain the CPU vector registers for use by the calling
+ * context
+ *
+ * Must not be called unless may_use_vector() returns true.
+ * Task context in the vector registers is saved back to memory as necessary.
+ *
+ * A matching call to kernel_rvv_end() must be made before returning from the
+ * calling context.
+ *
+ * The caller may freely use the vector registers until kernel_rvv_end() is
+ * called.
+ */
+void kernel_rvv_begin(void)
+{
+	if (WARN_ON(!has_vector))
+		return;
+
+	WARN_ON(!may_use_vector());
+
+	/* Acquire kernel mode vector */
+	get_cpu_vector_context();
+
+	/* Save vector state, if any */
+	vstate_save(current, task_pt_regs(current));
+
+	/* Enable vector */
+	rvv_enable();
+
+	/* Invalidate vector regs */
+	vector_flush_cpu_state();
+}
+EXPORT_SYMBOL(kernel_rvv_begin);
+
+/*
+ * kernel_rvv_end(): give the CPU vector registers back to the current task
+ *
+ * Must be called from a context in which kernel_rvv_begin() was previously
+ * called, with no call to kernel_rvv_end() in the meantime.
+ *
+ * The caller must not use the vector registers after this function is called,
+ * unless kernel_rvv_begin() is called again in the meantime.
+ */
+void kernel_rvv_end(void)
+{
+	if (WARN_ON(!has_vector))
+		return;
+
+	/* Invalidate vector regs */
+	vector_flush_cpu_state();
+
+	/* Restore vector state, if any */
+	vstate_restore(current, task_pt_regs(current));
+
+	/* disable vector */
+	rvv_disable();
+
+	/* release kernel mode vector */
+	put_cpu_vector_context();
+}
+EXPORT_SYMBOL(kernel_rvv_end);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 14/21] riscv: Use CSR_STATUS to replace sstatus in vector.S
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (12 preceding siblings ...)
  2021-09-08 17:45 ` [RFC PATCH v8 13/21] riscv: Add support for kernel mode vector Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-08 17:45 ` [RFC PATCH v8 15/21] riscv: Add vector extension XOR implementation Greentime Hu
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

It should use the same logic here in both m-mode and s-mode.

Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
---
 arch/riscv/kernel/vector.S | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/kernel/vector.S b/arch/riscv/kernel/vector.S
index 4c880b1c32aa..4f0c5a166e4e 100644
--- a/arch/riscv/kernel/vector.S
+++ b/arch/riscv/kernel/vector.S
@@ -32,7 +32,7 @@
 
 ENTRY(__vstate_save)
 	li      status, SR_VS
-	csrs    sstatus, status
+	csrs    CSR_STATUS, status
 
 	csrr    x_vstart, CSR_VSTART
 	csrr    x_vtype, CSR_VTYPE
@@ -53,13 +53,13 @@ ENTRY(__vstate_save)
 	REG_S   x_vl, RISCV_V_STATE_VL(vstatep)
 	REG_S   x_vcsr, RISCV_V_STATE_VCSR(vstatep)
 
-	csrc	sstatus, status
+	csrc	CSR_STATUS, status
 	ret
 ENDPROC(__vstate_save)
 
 ENTRY(__vstate_restore)
 	li      status, SR_VS
-	csrs    sstatus, status
+	csrs    CSR_STATUS, status
 
 	li      m_one, -1
 	vsetvli incr, m_one, e8, m8
@@ -79,6 +79,6 @@ ENTRY(__vstate_restore)
 	csrw    CSR_VSTART, x_vstart
 	csrw    CSR_VCSR, x_vcsr
 
-	csrc	sstatus, status
+	csrc	CSR_STATUS, status
 	ret
 ENDPROC(__vstate_restore)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 15/21] riscv: Add vector extension XOR implementation
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (13 preceding siblings ...)
  2021-09-08 17:45 ` [RFC PATCH v8 14/21] riscv: Use CSR_STATUS to replace sstatus in vector.S Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-09  6:12   ` Christoph Hellwig
  2021-09-14  8:29   ` Ley Foon Tan
  2021-09-08 17:45 ` [RFC PATCH v8 16/21] riscv: Initialize vector registers with proper vsetvli then it can work normally Greentime Hu
                   ` (7 subsequent siblings)
  22 siblings, 2 replies; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

This patch adds support for vector optimized XOR it is tested in spike and
qemu.

Logs in spike:
[    0.008365] xor: measuring software checksum speed
[    0.048885]    8regs     :  1719.000 MB/sec
[    0.089080]    32regs    :  1717.000 MB/sec
[    0.129275]    rvv       :  7043.000 MB/sec
[    0.129525] xor: using function: rvv (7043.000 MB/sec)

Logs in qemu:
[    0.098943] xor: measuring software checksum speed
[    0.139391]    8regs     :  2911.000 MB/sec
[    0.181079]    32regs    :  2813.000 MB/sec
[    0.224260]    rvv       :    45.000 MB/sec
[    0.225586] xor: using function: 8regs (2911.000 MB/sec)

Co-developed-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Signed-off-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
---
 arch/riscv/include/asm/xor.h | 74 ++++++++++++++++++++++++++++++++
 arch/riscv/lib/Makefile      |  1 +
 arch/riscv/lib/xor.S         | 81 ++++++++++++++++++++++++++++++++++++
 3 files changed, 156 insertions(+)
 create mode 100644 arch/riscv/include/asm/xor.h
 create mode 100644 arch/riscv/lib/xor.S

diff --git a/arch/riscv/include/asm/xor.h b/arch/riscv/include/asm/xor.h
new file mode 100644
index 000000000000..60ee0224913d
--- /dev/null
+++ b/arch/riscv/include/asm/xor.h
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2020 SiFive
+ */
+
+#include <linux/hardirq.h>
+#include <asm-generic/xor.h>
+#ifdef CONFIG_VECTOR
+#include <asm/vector.h>
+
+extern void xor_regs_2_(unsigned long bytes, unsigned long *p1,
+			unsigned long *p2);
+extern void xor_regs_3_(unsigned long bytes, unsigned long *p1,
+			unsigned long *p2, unsigned long *p3);
+extern void xor_regs_4_(unsigned long bytes, unsigned long *p1,
+			unsigned long *p2, unsigned long *p3,
+			unsigned long *p4);
+extern void xor_regs_5_(unsigned long bytes, unsigned long *p1,
+			unsigned long *p2, unsigned long *p3, unsigned long *p4,
+			unsigned long *p5);
+
+static void xor_rvv_2(unsigned long bytes, unsigned long *p1, unsigned long *p2)
+{
+	kernel_rvv_begin();
+	xor_regs_2_(bytes, p1, p2);
+	kernel_rvv_end();
+}
+
+static void
+xor_rvv_3(unsigned long bytes, unsigned long *p1, unsigned long *p2,
+	  unsigned long *p3)
+{
+	kernel_rvv_begin();
+	xor_regs_3_(bytes, p1, p2, p3);
+	kernel_rvv_end();
+}
+
+static void
+xor_rvv_4(unsigned long bytes, unsigned long *p1, unsigned long *p2,
+	  unsigned long *p3, unsigned long *p4)
+{
+	kernel_rvv_begin();
+	xor_regs_4_(bytes, p1, p2, p3, p4);
+	kernel_rvv_end();
+}
+
+static void
+xor_rvv_5(unsigned long bytes, unsigned long *p1, unsigned long *p2,
+	  unsigned long *p3, unsigned long *p4, unsigned long *p5)
+{
+	kernel_rvv_begin();
+	xor_regs_5_(bytes, p1, p2, p3, p4, p5);
+	kernel_rvv_end();
+}
+
+static struct xor_block_template xor_block_rvv = {
+	.name = "rvv",
+	.do_2 = xor_rvv_2,
+	.do_3 = xor_rvv_3,
+	.do_4 = xor_rvv_4,
+	.do_5 = xor_rvv_5
+};
+
+extern bool has_vector;
+#undef XOR_TRY_TEMPLATES
+#define XOR_TRY_TEMPLATES           \
+	do {        \
+		xor_speed(&xor_block_8regs);    \
+		xor_speed(&xor_block_32regs);    \
+		if (has_vector) { \
+			xor_speed(&xor_block_rvv);\
+		} \
+	} while (0)
+#endif
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 25d5c9664e57..acd87ac86d24 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -7,3 +7,4 @@ lib-$(CONFIG_MMU)	+= uaccess.o
 lib-$(CONFIG_64BIT)	+= tishift.o
 
 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
+lib-$(CONFIG_VECTOR)	+= xor.o
diff --git a/arch/riscv/lib/xor.S b/arch/riscv/lib/xor.S
new file mode 100644
index 000000000000..de2e234c39ed
--- /dev/null
+++ b/arch/riscv/lib/xor.S
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2020 SiFive
+ */
+#include <linux/linkage.h>
+#include <asm-generic/export.h>
+#include <asm/asm.h>
+
+ENTRY(xor_regs_2_)
+	vsetvli a3, a0, e8, m8
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a3
+	vxor.vv v16, v0, v8
+	add a2, a2, a3
+	vse8.v v16, (a1)
+	add a1, a1, a3
+	bnez a0, xor_regs_2_
+	ret
+END(xor_regs_2_)
+EXPORT_SYMBOL(xor_regs_2_)
+
+ENTRY(xor_regs_3_)
+	vsetvli a4, a0, e8, m8
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a4
+	vxor.vv v0, v0, v8
+	vle8.v v16, (a3)
+	add a2, a2, a4
+	vxor.vv v16, v0, v16
+	add a3, a3, a4
+	vse8.v v16, (a1)
+	add a1, a1, a4
+	bnez a0, xor_regs_3_
+	ret
+END(xor_regs_3_)
+EXPORT_SYMBOL(xor_regs_3_)
+
+ENTRY(xor_regs_4_)
+	vsetvli a5, a0, e8, m8
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a5
+	vxor.vv v0, v0, v8
+	vle8.v v16, (a3)
+	add a2, a2, a5
+	vxor.vv v0, v0, v16
+	vle8.v v24, (a4)
+	add a3, a3, a5
+	vxor.vv v16, v0, v24
+	add a4, a4, a5
+	vse8.v v16, (a1)
+	add a1, a1, a5
+	bnez a0, xor_regs_4_
+	ret
+END(xor_regs_4_)
+EXPORT_SYMBOL(xor_regs_4_)
+
+ENTRY(xor_regs_5_)
+	vsetvli a6, a0, e8, m8
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a6
+	vxor.vv v0, v0, v8
+	vle8.v v16, (a3)
+	add a2, a2, a6
+	vxor.vv v0, v0, v16
+	vle8.v v24, (a4)
+	add a3, a3, a6
+	vxor.vv v0, v0, v24
+	vle8.v v8, (a5)
+	add a4, a4, a6
+	vxor.vv v16, v0, v8
+	add a5, a5, a6
+	vse8.v v16, (a1)
+	add a1, a1, a6
+	bnez a0, xor_regs_5_
+	ret
+END(xor_regs_5_)
+EXPORT_SYMBOL(xor_regs_5_)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 16/21] riscv: Initialize vector registers with proper vsetvli then it can work normally
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (14 preceding siblings ...)
  2021-09-08 17:45 ` [RFC PATCH v8 15/21] riscv: Add vector extension XOR implementation Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-08 17:45 ` [RFC PATCH v8 17/21] riscv: Optimize vector registers initialization Greentime Hu
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

It may cause an illegal instruction exception if it doesn't use vsetvli
before vmv.v.i v0, 0.

Co-developed-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Signed-off-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
---
 arch/riscv/kernel/head.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
index cf331f138142..42eb3203fa77 100644
--- a/arch/riscv/kernel/head.S
+++ b/arch/riscv/kernel/head.S
@@ -442,6 +442,7 @@ ENTRY(reset_regs)
 
 	li	t1, SR_VS
 	csrs	CSR_STATUS, t1
+	vsetvli t1, x0, e8, m1
 	vmv.v.i v0, 0
 	vmv.v.i v1, 0
 	vmv.v.i v2, 0
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 17/21] riscv: Optimize vector registers initialization
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (15 preceding siblings ...)
  2021-09-08 17:45 ` [RFC PATCH v8 16/21] riscv: Initialize vector registers with proper vsetvli then it can work normally Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-08 17:45 ` [RFC PATCH v8 18/21] riscv: Fix an illegal instruction exception when accessing vlenb without enable vector first Greentime Hu
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

This patch optimizes the initialization or invalidation of vector
registers. It can reduce the code sizes of vector_flush_cpu_state()
and reset_regs().

Co-developed-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Signed-off-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
---
 arch/riscv/kernel/head.S               | 30 +-----------------------
 arch/riscv/kernel/kernel_mode_vector.c | 32 ++------------------------
 2 files changed, 3 insertions(+), 59 deletions(-)

diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
index 42eb3203fa77..8362d7458c6c 100644
--- a/arch/riscv/kernel/head.S
+++ b/arch/riscv/kernel/head.S
@@ -442,39 +442,11 @@ ENTRY(reset_regs)
 
 	li	t1, SR_VS
 	csrs	CSR_STATUS, t1
-	vsetvli t1, x0, e8, m1
+	vsetvli t1, x0, e8, m8
 	vmv.v.i v0, 0
-	vmv.v.i v1, 0
-	vmv.v.i v2, 0
-	vmv.v.i v3, 0
-	vmv.v.i v4, 0
-	vmv.v.i v5, 0
-	vmv.v.i v6, 0
-	vmv.v.i v7, 0
 	vmv.v.i v8, 0
-	vmv.v.i v9, 0
-	vmv.v.i v10, 0
-	vmv.v.i v11, 0
-	vmv.v.i v12, 0
-	vmv.v.i v13, 0
-	vmv.v.i v14, 0
-	vmv.v.i v15, 0
 	vmv.v.i v16, 0
-	vmv.v.i v17, 0
-	vmv.v.i v18, 0
-	vmv.v.i v19, 0
-	vmv.v.i v20, 0
-	vmv.v.i v21, 0
-	vmv.v.i v22, 0
-	vmv.v.i v23, 0
 	vmv.v.i v24, 0
-	vmv.v.i v25, 0
-	vmv.v.i v26, 0
-	vmv.v.i v27, 0
-	vmv.v.i v28, 0
-	vmv.v.i v29, 0
-	vmv.v.i v30, 0
-	vmv.v.i v31, 0
 	/* note that the caller must clear SR_VS */
 #endif /* CONFIG_VECTOR */
 
diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
index 108cfafe7496..b84618630edf 100644
--- a/arch/riscv/kernel/kernel_mode_vector.c
+++ b/arch/riscv/kernel/kernel_mode_vector.c
@@ -86,39 +86,11 @@ static void vector_flush_cpu_state(void)
 	long tmp;
 
 	__asm__ __volatile__ (
-		"vsetvli %0, x0, e8, m1\n"
+		"vsetvli %0, x0, e8, m8\n"
 		"vmv.v.i v0, 0\n"
-		"vmv.v.i v1, 0\n"
-		"vmv.v.i v2, 0\n"
-		"vmv.v.i v3, 0\n"
-		"vmv.v.i v4, 0\n"
-		"vmv.v.i v5, 0\n"
-		"vmv.v.i v6, 0\n"
-		"vmv.v.i v7, 0\n"
 		"vmv.v.i v8, 0\n"
-		"vmv.v.i v9, 0\n"
-		"vmv.v.i v10, 0\n"
-		"vmv.v.i v11, 0\n"
-		"vmv.v.i v12, 0\n"
-		"vmv.v.i v13, 0\n"
-		"vmv.v.i v14, 0\n"
-		"vmv.v.i v15, 0\n"
 		"vmv.v.i v16, 0\n"
-		"vmv.v.i v17, 0\n"
-		"vmv.v.i v18, 0\n"
-		"vmv.v.i v19, 0\n"
-		"vmv.v.i v20, 0\n"
-		"vmv.v.i v21, 0\n"
-		"vmv.v.i v22, 0\n"
-		"vmv.v.i v23, 0\n"
-		"vmv.v.i v24, 0\n"
-		"vmv.v.i v25, 0\n"
-		"vmv.v.i v26, 0\n"
-		"vmv.v.i v27, 0\n"
-		"vmv.v.i v28, 0\n"
-		"vmv.v.i v29, 0\n"
-		"vmv.v.i v30, 0\n"
-		"vmv.v.i v31, 0\n":"=r"(tmp)::);
+		"vmv.v.i v24, 0\n":"=r"(tmp)::);
 }
 
 /*
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 18/21] riscv: Fix an illegal instruction exception when accessing vlenb without enable vector first
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (16 preceding siblings ...)
  2021-09-08 17:45 ` [RFC PATCH v8 17/21] riscv: Optimize vector registers initialization Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-08 17:45 ` [RFC PATCH v8 19/21] riscv: Allocate space for vector registers in start_thread() Greentime Hu
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

It triggered an illegal instruction exception when accessing vlenb CSR
without enable vector first. To fix this issue, we should enable vector
before using it and disable vector after using it.

Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
---
 arch/riscv/include/asm/vector.h        | 2 ++
 arch/riscv/kernel/cpufeature.c         | 3 +++
 arch/riscv/kernel/kernel_mode_vector.c | 6 ++++--
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 5d7f14453f68..ca063c8f47f2 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -8,6 +8,8 @@
 
 #include <linux/types.h>
 
+void rvv_enable(void);
+void rvv_disable(void);
 void kernel_rvv_begin(void);
 void kernel_rvv_end(void);
 
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 7265d947d981..af984f875f60 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -22,6 +22,7 @@ static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly;
 __ro_after_init DEFINE_STATIC_KEY_FALSE(cpu_hwcap_fpu);
 #endif
 #ifdef CONFIG_VECTOR
+#include <asm/vector.h>
 bool has_vector __read_mostly;
 unsigned long riscv_vsize __read_mostly;
 #endif
@@ -158,7 +159,9 @@ void __init riscv_fill_hwcap(void)
 	if (elf_hwcap & COMPAT_HWCAP_ISA_V) {
 		has_vector = true;
 		/* There are 32 vector registers with vlenb length. */
+		rvv_enable();
 		riscv_vsize = csr_read(CSR_VLENB) * 32;
+		rvv_disable();
 	}
 #endif
 }
diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
index b84618630edf..0d990bd8b8dd 100644
--- a/arch/riscv/kernel/kernel_mode_vector.c
+++ b/arch/riscv/kernel/kernel_mode_vector.c
@@ -71,15 +71,17 @@ static void put_cpu_vector_context(void)
 	preempt_enable();
 }
 
-static void rvv_enable(void)
+void rvv_enable(void)
 {
 	csr_set(CSR_STATUS, SR_VS);
 }
+EXPORT_SYMBOL(rvv_enable);
 
-static void rvv_disable(void)
+void rvv_disable(void)
 {
 	csr_clear(CSR_STATUS, SR_VS);
 }
+EXPORT_SYMBOL(rvv_disable);
 
 static void vector_flush_cpu_state(void)
 {
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 19/21] riscv: Allocate space for vector registers in start_thread()
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (17 preceding siblings ...)
  2021-09-08 17:45 ` [RFC PATCH v8 18/21] riscv: Fix an illegal instruction exception when accessing vlenb without enable vector first Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-08 17:45 ` [RFC PATCH v8 20/21] riscv: Optimize task switch codes of vector Greentime Hu
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

It allocates memory space for vector registers in start_thread() instead of
allocating in vstate_restore() in this patch. We can allocate memory here
so that it will be more readable.

Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
---
 arch/riscv/include/asm/switch_to.h |  7 +------
 arch/riscv/kernel/process.c        | 15 +++++++++++++--
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index de0573dad78f..b48c9c974564 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -103,12 +103,6 @@ static inline void vstate_restore(struct task_struct *task,
 {
 	if ((regs->status & SR_VS) != SR_VS_OFF) {
 		struct __riscv_v_state *vstate = &(task->thread.vstate);
-
-		/* Allocate space for vector registers. */
-		if (!vstate->datap) {
-			vstate->datap = kzalloc(riscv_vsize, GFP_ATOMIC);
-			vstate->size = riscv_vsize;
-		}
 		__vstate_restore(vstate, vstate->datap);
 		__vstate_clean(regs);
 	}
@@ -127,6 +121,7 @@ static inline void __switch_to_vector(struct task_struct *prev,
 
 #else
 #define has_vector false
+#define riscv_vsize (0)
 #define vstate_save(task, regs) do { } while (0)
 #define vstate_restore(task, regs) do { } while (0)
 #define __switch_to_vector(__prev, __next) do { } while (0)
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 0b86e9e531c9..05ff5f934e7e 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -97,7 +97,16 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
 	}
 
 	if (has_vector) {
+		struct __riscv_v_state *vstate = &(current->thread.vstate);
+
+		/* Enable vector and allocate memory for vector registers. */
+		if (!vstate->datap) {
+			vstate->datap = kzalloc(riscv_vsize, GFP_KERNEL);
+			if (WARN_ON(!vstate->datap))
+				return;
+		}
 		regs->status |= SR_VS_INITIAL;
+
 		/*
 		 * Restore the initial value to the vector register
 		 * before starting the user program.
@@ -121,9 +130,11 @@ void flush_thread(void)
 	memset(&current->thread.fstate, 0, sizeof(current->thread.fstate));
 #endif
 #ifdef CONFIG_VECTOR
-	/* Reset vector state */
+	/* Reset vector state and keep datap pointer. */
 	vstate_off(current, task_pt_regs(current));
-	memset(&current->thread.vstate, 0, sizeof(current->thread.vstate));
+	memset(&current->thread.vstate, 0, RISCV_V_STATE_DATAP);
+	if (current->thread.vstate.datap)
+		memset(current->thread.vstate.datap, 0, riscv_vsize);
 #endif
 }
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 20/21] riscv: Optimize task switch codes of vector
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (18 preceding siblings ...)
  2021-09-08 17:45 ` [RFC PATCH v8 19/21] riscv: Allocate space for vector registers in start_thread() Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-15 14:29   ` Jisheng Zhang
  2021-09-08 17:45 ` [RFC PATCH v8 21/21] riscv: Turn has_vector into a static key if VECTOR=y Greentime Hu
                   ` (2 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

This patch replacees 2 instructions with 1 instruction to do the same thing
. rs1=x0 with rd != x0 is a special form of the instruction that sets vl to
MAXVL.

Suggested-by: Andrew Waterman <andrew@sifive.com>
Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
---
 arch/riscv/kernel/vector.S | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/riscv/kernel/vector.S b/arch/riscv/kernel/vector.S
index 4f0c5a166e4e..f7223c81b11a 100644
--- a/arch/riscv/kernel/vector.S
+++ b/arch/riscv/kernel/vector.S
@@ -27,8 +27,7 @@
 #define x_vl     t2
 #define x_vcsr   t3
 #define incr     t4
-#define m_one    t5
-#define status   t6
+#define status   t5
 
 ENTRY(__vstate_save)
 	li      status, SR_VS
@@ -38,8 +37,7 @@ ENTRY(__vstate_save)
 	csrr    x_vtype, CSR_VTYPE
 	csrr    x_vl, CSR_VL
 	csrr    x_vcsr, CSR_VCSR
-	li      m_one, -1
-	vsetvli incr, m_one, e8, m8
+	vsetvli incr, x0, e8, m8
 	vse8.v   v0, (datap)
 	add     datap, datap, incr
 	vse8.v   v8, (datap)
@@ -61,8 +59,7 @@ ENTRY(__vstate_restore)
 	li      status, SR_VS
 	csrs    CSR_STATUS, status
 
-	li      m_one, -1
-	vsetvli incr, m_one, e8, m8
+	vsetvli incr, x0, e8, m8
 	vle8.v   v0, (datap)
 	add     datap, datap, incr
 	vle8.v   v8, (datap)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v8 21/21] riscv: Turn has_vector into a static key if VECTOR=y
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (19 preceding siblings ...)
  2021-09-08 17:45 ` [RFC PATCH v8 20/21] riscv: Optimize task switch codes of vector Greentime Hu
@ 2021-09-08 17:45 ` Greentime Hu
  2021-09-15 14:24   ` Jisheng Zhang
  2021-09-13  1:47 ` [RFC PATCH v8 00/21] riscv: Add vector ISA support Vincent Chen
  2021-09-13 17:18 ` Vineet Gupta
  22 siblings, 1 reply; 55+ messages in thread
From: Greentime Hu @ 2021-09-08 17:45 UTC (permalink / raw)
  To: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

Just like fpu, we can use static key for has_vector.
The has_vector check sits at hot code path: switch_to(). Currently,
has_vector is a bool variable if VECTOR=y, switch_to() checks it each time,
we can optimize out this check by turning the has_vector into a static key.

Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
---
 arch/riscv/include/asm/switch_to.h     | 10 +++++++---
 arch/riscv/kernel/cpufeature.c         |  4 ++--
 arch/riscv/kernel/kernel_mode_vector.c |  4 ++--
 arch/riscv/kernel/process.c            |  8 ++++----
 arch/riscv/kernel/signal.c             |  6 +++---
 5 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index b48c9c974564..576204217e0f 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -71,7 +71,11 @@ static __always_inline bool has_fpu(void) { return false; }
 #endif
 
 #ifdef CONFIG_VECTOR
-extern bool has_vector;
+extern struct static_key_false cpu_hwcap_vector;
+static __always_inline bool has_vector(void)
+{
+	return static_branch_likely(&cpu_hwcap_vector);
+}
 extern unsigned long riscv_vsize;
 extern void __vstate_save(struct __riscv_v_state *save_to, void *datap);
 extern void __vstate_restore(struct __riscv_v_state *restore_from, void *datap);
@@ -120,7 +124,7 @@ static inline void __switch_to_vector(struct task_struct *prev,
 }
 
 #else
-#define has_vector false
+static __always_inline bool has_vector(void) { return false; }
 #define riscv_vsize (0)
 #define vstate_save(task, regs) do { } while (0)
 #define vstate_restore(task, regs) do { } while (0)
@@ -136,7 +140,7 @@ do {							\
 	struct task_struct *__next = (next);		\
 	if (has_fpu())					\
 		__switch_to_fpu(__prev, __next);	\
-	if (has_vector)					\
+	if (has_vector())					\
 		__switch_to_vector(__prev, __next);	\
 	((last) = __switch_to(__prev, __next));		\
 } while (0)
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index af984f875f60..0139ec20adce 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -23,7 +23,7 @@ __ro_after_init DEFINE_STATIC_KEY_FALSE(cpu_hwcap_fpu);
 #endif
 #ifdef CONFIG_VECTOR
 #include <asm/vector.h>
-bool has_vector __read_mostly;
+__ro_after_init DEFINE_STATIC_KEY_FALSE(cpu_hwcap_vector);
 unsigned long riscv_vsize __read_mostly;
 #endif
 
@@ -157,7 +157,7 @@ void __init riscv_fill_hwcap(void)
 
 #ifdef CONFIG_VECTOR
 	if (elf_hwcap & COMPAT_HWCAP_ISA_V) {
-		has_vector = true;
+		static_branch_enable(&cpu_hwcap_vector);
 		/* There are 32 vector registers with vlenb length. */
 		rvv_enable();
 		riscv_vsize = csr_read(CSR_VLENB) * 32;
diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
index 0d990bd8b8dd..0d08954c30af 100644
--- a/arch/riscv/kernel/kernel_mode_vector.c
+++ b/arch/riscv/kernel/kernel_mode_vector.c
@@ -110,7 +110,7 @@ static void vector_flush_cpu_state(void)
  */
 void kernel_rvv_begin(void)
 {
-	if (WARN_ON(!has_vector))
+	if (WARN_ON(!has_vector()))
 		return;
 
 	WARN_ON(!may_use_vector());
@@ -140,7 +140,7 @@ EXPORT_SYMBOL(kernel_rvv_begin);
  */
 void kernel_rvv_end(void)
 {
-	if (WARN_ON(!has_vector))
+	if (WARN_ON(!has_vector()))
 		return;
 
 	/* Invalidate vector regs */
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 05ff5f934e7e..62540815ba1c 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -96,7 +96,7 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
 		fstate_restore(current, regs);
 	}
 
-	if (has_vector) {
+	if (has_vector()) {
 		struct __riscv_v_state *vstate = &(current->thread.vstate);
 
 		/* Enable vector and allocate memory for vector registers. */
@@ -141,11 +141,11 @@ void flush_thread(void)
 int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 {
 	fstate_save(src, task_pt_regs(src));
-	if (has_vector)
+	if (has_vector())
 		/* To make sure every dirty vector context is saved. */
 		vstate_save(src, task_pt_regs(src));
 	*dst = *src;
-	if (has_vector) {
+	if (has_vector()) {
 		/* Copy vector context to the forked task from parent. */
 		if ((task_pt_regs(src)->status & SR_VS) != SR_VS_OFF) {
 			dst->thread.vstate.datap = kzalloc(riscv_vsize, GFP_KERNEL);
@@ -164,7 +164,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 void arch_release_task_struct(struct task_struct *tsk)
 {
 	/* Free the vector context of datap. */
-	if (has_vector)
+	if (has_vector())
 		kfree(tsk->thread.vstate.datap);
 }
 
diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
index d30a3b588156..6a19b4b7b206 100644
--- a/arch/riscv/kernel/signal.c
+++ b/arch/riscv/kernel/signal.c
@@ -192,7 +192,7 @@ static long restore_sigcontext(struct pt_regs *regs,
 				goto invalid;
 			goto done;
 		case RVV_MAGIC:
-			if (!has_vector)
+			if (!has_vector())
 				goto invalid;
 			if (size != rvv_sc_size)
 				goto invalid;
@@ -221,7 +221,7 @@ static size_t cal_rt_frame_size(void)
 
 	frame_size = sizeof(*frame);
 
-	if (has_vector)
+	if (has_vector())
 		total_context_size += rvv_sc_size;
 	/* Preserved a __riscv_ctx_hdr for END signal context header. */
 	total_context_size += sizeof(struct __riscv_ctx_hdr);
@@ -288,7 +288,7 @@ static long setup_sigcontext(struct rt_sigframe __user *frame,
 	if (has_fpu())
 		err |= save_fp_state(regs, &sc->sc_fpregs);
 	/* Save the vector state. */
-	if (has_vector)
+	if (has_vector())
 		err |= save_v_state(regs, &sc_reserved_free_ptr);
 
 	/* Put END __riscv_ctx_hdr at the end. */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 15/21] riscv: Add vector extension XOR implementation
  2021-09-08 17:45 ` [RFC PATCH v8 15/21] riscv: Add vector extension XOR implementation Greentime Hu
@ 2021-09-09  6:12   ` Christoph Hellwig
  2021-09-28  7:00     ` Greentime Hu
  2021-09-14  8:29   ` Ley Foon Tan
  1 sibling, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2021-09-09  6:12 UTC (permalink / raw)
  To: Greentime Hu
  Cc: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

On Thu, Sep 09, 2021 at 01:45:27AM +0800, Greentime Hu wrote:
> +extern void xor_regs_2_(unsigned long bytes, unsigned long *p1,
> +			unsigned long *p2);
> +extern void xor_regs_3_(unsigned long bytes, unsigned long *p1,
> +			unsigned long *p2, unsigned long *p3);
> +extern void xor_regs_4_(unsigned long bytes, unsigned long *p1,
> +			unsigned long *p2, unsigned long *p3,
> +			unsigned long *p4);
> +extern void xor_regs_5_(unsigned long bytes, unsigned long *p1,
> +			unsigned long *p2, unsigned long *p3, unsigned long *p4,
> +			unsigned long *p5);

There is no need for externs on function declarations ever.

> +static void xor_rvv_2(unsigned long bytes, unsigned long *p1, unsigned long *p2)
> +{
> +	kernel_rvv_begin();
> +	xor_regs_2_(bytes, p1, p2);
> +	kernel_rvv_end();
> +}

This looks strange.  Why these wrappers?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 13/21] riscv: Add support for kernel mode vector
  2021-09-08 17:45 ` [RFC PATCH v8 13/21] riscv: Add support for kernel mode vector Greentime Hu
@ 2021-09-09  6:17   ` Christoph Hellwig
  0 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2021-09-09  6:17 UTC (permalink / raw)
  To: Greentime Hu
  Cc: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

On Thu, Sep 09, 2021 at 01:45:25AM +0800, Greentime Hu wrote:
> +obj-$(CONFIG_VECTOR)		+= kernel_mode_vector.o
> +riscv-march-cflags-$(CONFIG_ARCH_RV32I)		:= rv32ima
> +riscv-march-cflags-$(CONFIG_ARCH_RV64I)		:= rv64ima
> +riscv-march-cflags-$(CONFIG_RISCV_ISA_C)	:= $(riscv-march-cflags-y)c
> +riscv-march-cflags-$(CONFIG_VECTOR)		:= $(riscv-march-cflags-y)v
> +CFLAGS_kernel_mode_vector.o	+= -march=$(riscv-march-cflags-y)

Do we need a helper in arch/riscv/Makefile to define the vector flags
instead of open coding them where used?  Also I think the variable
name should include vector in it.


> +EXPORT_SYMBOL(kernel_rvv_begin);

> +EXPORT_SYMBOL(kernel_rvv_end);

This needs to be EXPORT_SYMBOL_GPL just like x86 kernel_fpu_begin/end

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 00/21] riscv: Add vector ISA support
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (20 preceding siblings ...)
  2021-09-08 17:45 ` [RFC PATCH v8 21/21] riscv: Turn has_vector into a static key if VECTOR=y Greentime Hu
@ 2021-09-13  1:47 ` Vincent Chen
  2021-09-13 17:18 ` Vineet Gupta
  22 siblings, 0 replies; 55+ messages in thread
From: Vincent Chen @ 2021-09-13  1:47 UTC (permalink / raw)
  To: Greentime Hu
  Cc: linux-riscv, linux-kernel@vger.kernel.org List, Albert Ou,
	Palmer Dabbelt, Paul Walmsley

Hi all,

The associated Glibc vector patches could be found here
https://sourceware.org/pipermail/libc-alpha/2021-September/130897.html
Thanks

On Thu, Sep 9, 2021 at 1:45 AM Greentime Hu <greentime.hu@sifive.com> wrote:
>
> This patchset is implemented based on vector 1.0-rc1 spec to add vector
> support in riscv Linux kernel. To make this happen, we defined a new
> structure __riscv_v_state to save the vector related registers. It is used
> for both kernel space and user space.
>
>  - In kernel space, the datap pointer in __riscv_v_state will be allocated
>    dynamically to save vector registers.
>  - In user space,
>         - In signal handler of user space, datap will point to the address
>           of the __riscv_v_state data structure to save vector
>           registers in stack. We also create a __reserved[] array for
>           future extensions.
>         - In ptrace, the data will be put in ubuf in which we use
>           riscv_vr_get()/riscv_vr_set() to get or set the
>           __riscv_v_state data structure from/to it, datap pointer
>           would be zeroed and vector registers will be copied to the
>           address right after the __riscv_v_state structure in ubuf.
>
> This patchset also adds support for kernel mode vector, kernel XOR
> implementation with vector ISA and includes several bug fixes and code
> refinement.
>
> This patchset is rebased to v5.14 and it is tested by running several
> vector programs simultaneously. It also can get the correct ucontext_t in
> signal handler and restore correct context after sigreturn. It is also
> tested with ptrace() syscall to use PTRACE_GETREGSET/PTRACE_SETREGSET to
> get/set the vector registers. I have tested vlen=128 and vlen=256 cases in
> qemu-system-riscv64 provided by Frank Chang.
>
> We have sent patches to glibc mailing list for ifunc support and sigcontext
> changes. We will send patches for vector support to glibc mailing list
> recently.
>
>  [1] https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc
>
> ---
> Changelog V8
>  - Rebase to v5.14
>  - Refine struct __riscv_v_state with struct __riscv_ctx_hdr
>  - Refine has_vector into a static key
>  - Defined __reserved space in struct sigcontext for vector and future extensions
>
> Changelog V7
>  - Add support for kernel mode vector
>  - Add vector extension XOR implementation
>  - Optimize task switch codes of vector
>  - Allocate space for vector registers in start_thread()
>  - Fix an illegal instruction exception when accessing vlenb
>  - Optimize vector registers initialization
>  - Initialize vector registers with proper vsetvli then it can work normally
>  - Refine ptrace porting due to generic API changed
>  - Code clean up
>
> Changelog V6
>  - Replace vle.v/vse.v instructions with vle8.v/vse8.v based on 0.9 spec
>  - Add comments based on mailinglist feedback
>  - Fix rv32 build error
>
> Changelog V5
>  - Using regset_size() correctly in generic ptrace
>  - Fix the ptrace porting
>  - Fix compile warning
>
> Changelog V4
>  - Support dynamic vlen
>  - Fix bugs: lazy save/resotre, not saving vtype
>  - Update VS bit offset based on latest vector spec
>  - Add new vector csr based on latest vector spec
>  - Code refine and removed unused macros
>
> Changelog V3
>  - Rebase linux-5.6-rc3 and tested with qemu
>  - Seperate patches with Anup's advice
>  - Give out a ABI puzzle with unlimited vlen
>
> Changelog V2
>  - Fixup typo "vecotr, fstate_save->vstate_save".
>  - Fixup wrong saved registers' length in vector.S.
>  - Seperate unrelated patches from this one.
>
> Greentime Hu (15):
>   riscv: Add new csr defines related to vector extension
>   riscv: Add has_vector/riscv_vsize to save vector features.
>   riscv: Add vector struct and assembler definitions
>   riscv: Add task switch support for vector
>   riscv: Add ptrace vector support
>   riscv: Add sigcontext save/restore for vector
>   riscv: Add support for kernel mode vector
>   riscv: Use CSR_STATUS to replace sstatus in vector.S
>   riscv: Add vector extension XOR implementation
>   riscv: Initialize vector registers with proper vsetvli then it can
>     work normally
>   riscv: Optimize vector registers initialization
>   riscv: Fix an illegal instruction exception when accessing vlenb
>     without enable vector first
>   riscv: Allocate space for vector registers in start_thread()
>   riscv: Optimize task switch codes of vector
>   riscv: Turn has_vector into a static key if VECTOR=y
>
> Guo Ren (5):
>   riscv: Separate patch for cflags and aflags
>   riscv: Rename __switch_to_aux -> fpu
>   riscv: Extending cpufeature.c to detect V-extension
>   riscv: Add vector feature to compile
>   riscv: Reset vector register
>
> Vincent Chen (1):
>   riscv: signal: Report signal frame size to userspace via auxv
>
>  arch/riscv/Kconfig                       |   9 ++
>  arch/riscv/Makefile                      |  19 ++-
>  arch/riscv/include/asm/csr.h             |  16 ++-
>  arch/riscv/include/asm/elf.h             |  41 +++---
>  arch/riscv/include/asm/processor.h       |   3 +
>  arch/riscv/include/asm/switch_to.h       |  71 +++++++++-
>  arch/riscv/include/asm/vector.h          |  16 +++
>  arch/riscv/include/asm/xor.h             |  74 ++++++++++
>  arch/riscv/include/uapi/asm/auxvec.h     |   1 +
>  arch/riscv/include/uapi/asm/hwcap.h      |   1 +
>  arch/riscv/include/uapi/asm/ptrace.h     |  25 ++++
>  arch/riscv/include/uapi/asm/sigcontext.h |  24 ++++
>  arch/riscv/kernel/Makefile               |   7 +
>  arch/riscv/kernel/asm-offsets.c          |   8 ++
>  arch/riscv/kernel/cpufeature.c           |  16 +++
>  arch/riscv/kernel/entry.S                |   6 +-
>  arch/riscv/kernel/head.S                 |  22 ++-
>  arch/riscv/kernel/kernel_mode_vector.c   | 158 +++++++++++++++++++++
>  arch/riscv/kernel/process.c              |  49 +++++++
>  arch/riscv/kernel/ptrace.c               |  71 ++++++++++
>  arch/riscv/kernel/setup.c                |   4 +
>  arch/riscv/kernel/signal.c               | 172 ++++++++++++++++++++++-
>  arch/riscv/kernel/vector.S               |  81 +++++++++++
>  arch/riscv/lib/Makefile                  |   1 +
>  arch/riscv/lib/xor.S                     |  81 +++++++++++
>  include/uapi/linux/elf.h                 |   1 +
>  26 files changed, 941 insertions(+), 36 deletions(-)
>  create mode 100644 arch/riscv/include/asm/vector.h
>  create mode 100644 arch/riscv/include/asm/xor.h
>  create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
>  create mode 100644 arch/riscv/kernel/vector.S
>  create mode 100644 arch/riscv/lib/xor.S
>
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-09-08 17:45 ` [RFC PATCH v8 09/21] riscv: Add task switch support for vector Greentime Hu
@ 2021-09-13 12:21   ` Darius Rad
  2021-09-28 14:56     ` Greentime Hu
  0 siblings, 1 reply; 55+ messages in thread
From: Darius Rad @ 2021-09-13 12:21 UTC (permalink / raw)
  To: Greentime Hu, linux-riscv, linux-kernel, aou, palmer,
	paul.walmsley, vincent.chen

On 9/8/21 1:45 PM, Greentime Hu wrote:
> This patch adds task switch support for vector. It supports partial lazy
> save and restore mechanism. It also supports all lengths of vlen.
> 
> [guoren@linux.alibaba.com: First available porting to support vector
> context switching]
> [nick.knight@sifive.com: Rewrite vector.S to support dynamic vlen, xlen and
> code refine]
> [vincent.chen@sifive.co: Fix the might_sleep issue in vstate_save,
> vstate_restore]
> Co-developed-by: Nick Knight <nick.knight@sifive.com>
> Signed-off-by: Nick Knight <nick.knight@sifive.com>
> Co-developed-by: Guo Ren <guoren@linux.alibaba.com>
> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
> Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> ---
>   arch/riscv/include/asm/switch_to.h | 66 +++++++++++++++++++++++
>   arch/riscv/kernel/Makefile         |  1 +
>   arch/riscv/kernel/process.c        | 38 ++++++++++++++
>   arch/riscv/kernel/vector.S         | 84 ++++++++++++++++++++++++++++++
>   4 files changed, 189 insertions(+)
>   create mode 100644 arch/riscv/kernel/vector.S
> 
> diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
> index ec83770b3d98..de0573dad78f 100644
> --- a/arch/riscv/include/asm/switch_to.h
> +++ b/arch/riscv/include/asm/switch_to.h
> @@ -7,10 +7,12 @@
>   #define _ASM_RISCV_SWITCH_TO_H
>   
>   #include <linux/jump_label.h>
> +#include <linux/slab.h>
>   #include <linux/sched/task_stack.h>
>   #include <asm/processor.h>
>   #include <asm/ptrace.h>
>   #include <asm/csr.h>
> +#include <asm/asm-offsets.h>
>   
>   #ifdef CONFIG_FPU
>   extern void __fstate_save(struct task_struct *save_to);
> @@ -68,6 +70,68 @@ static __always_inline bool has_fpu(void) { return false; }
>   #define __switch_to_fpu(__prev, __next) do { } while (0)
>   #endif
>   
> +#ifdef CONFIG_VECTOR
> +extern bool has_vector;
> +extern unsigned long riscv_vsize;
> +extern void __vstate_save(struct __riscv_v_state *save_to, void *datap);
> +extern void __vstate_restore(struct __riscv_v_state *restore_from, void *datap);
> +
> +static inline void __vstate_clean(struct pt_regs *regs)
> +{
> +	regs->status = (regs->status & ~(SR_VS)) | SR_VS_CLEAN;
> +}
> +
> +static inline void vstate_off(struct task_struct *task,
> +			      struct pt_regs *regs)
> +{
> +	regs->status = (regs->status & ~SR_VS) | SR_VS_OFF;
> +}
> +
> +static inline void vstate_save(struct task_struct *task,
> +			       struct pt_regs *regs)
> +{
> +	if ((regs->status & SR_VS) == SR_VS_DIRTY) {
> +		struct __riscv_v_state *vstate = &(task->thread.vstate);
> +
> +		__vstate_save(vstate, vstate->datap);
> +		__vstate_clean(regs);
> +	}
> +}
> +
> +static inline void vstate_restore(struct task_struct *task,
> +				  struct pt_regs *regs)
> +{
> +	if ((regs->status & SR_VS) != SR_VS_OFF) {
> +		struct __riscv_v_state *vstate = &(task->thread.vstate);
> +
> +		/* Allocate space for vector registers. */
> +		if (!vstate->datap) {
> +			vstate->datap = kzalloc(riscv_vsize, GFP_ATOMIC);
> +			vstate->size = riscv_vsize;
> +		}
> +		__vstate_restore(vstate, vstate->datap);
> +		__vstate_clean(regs);
> +	}
> +}
> +
> +static inline void __switch_to_vector(struct task_struct *prev,
> +				   struct task_struct *next)
> +{
> +	struct pt_regs *regs;
> +
> +	regs = task_pt_regs(prev);
> +	if (unlikely(regs->status & SR_SD))
> +		vstate_save(prev, regs);
> +	vstate_restore(next, task_pt_regs(next));
> +}
> +
> +#else
> +#define has_vector false
> +#define vstate_save(task, regs) do { } while (0)
> +#define vstate_restore(task, regs) do { } while (0)
> +#define __switch_to_vector(__prev, __next) do { } while (0)
> +#endif
> +
>   extern struct task_struct *__switch_to(struct task_struct *,
>   				       struct task_struct *);
>   
> @@ -77,6 +141,8 @@ do {							\
>   	struct task_struct *__next = (next);		\
>   	if (has_fpu())					\
>   		__switch_to_fpu(__prev, __next);	\
> +	if (has_vector)					\
> +		__switch_to_vector(__prev, __next);	\
>   	((last) = __switch_to(__prev, __next));		\
>   } while (0)
>   
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index 3397ddac1a30..344078080839 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -40,6 +40,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
>   
>   obj-$(CONFIG_RISCV_M_MODE)	+= traps_misaligned.o
>   obj-$(CONFIG_FPU)		+= fpu.o
> +obj-$(CONFIG_VECTOR)		+= vector.o
>   obj-$(CONFIG_SMP)		+= smpboot.o
>   obj-$(CONFIG_SMP)		+= smp.o
>   obj-$(CONFIG_SMP)		+= cpu_ops.o
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 03ac3aa611f5..0b86e9e531c9 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -95,6 +95,16 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
>   		 */
>   		fstate_restore(current, regs);
>   	}
> +
> +	if (has_vector) {
> +		regs->status |= SR_VS_INITIAL;
> +		/*
> +		 * Restore the initial value to the vector register
> +		 * before starting the user program.
> +		 */
> +		vstate_restore(current, regs);
> +	}
> +

So this will unconditionally enable vector instructions, and allocate 
memory for vector state, for all processes, regardless of whether vector 
instructions are used?

Given the size of the vector state and potential power and performance 
implications of enabling the vector engine, it seems like this should 
treated similarly to Intel AMX on x86.  The full discussion of that is 
here:

https://lore.kernel.org/lkml/CALCETrW2QHa2TLvnUuVxAAheqcbSZ-5_WRXtDSAGcbG8N+gtdQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/

The cover letter for recent Intel AMX patches has a summary of the x86 
implementation:

https://lore.kernel.org/lkml/20210825155413.19673-1-chang.seok.bae@intel.com/

If RISC-V were to adopt a similar approach, I think the significant 
points are:

  1. A process (or thread) must specifically request the desire to use
vector extensions (perhaps with some new arch_prctl() API),

  2. The kernel is free to deny permission, perhaps based on 
administrative rules or for other reasons, and

  3. If a process attempts to use vector extensions before doing the 
above, the process will die due to an illegal instruction.


>   	regs->epc = pc;
>   	regs->sp = sp;
>   }
> @@ -110,15 +120,43 @@ void flush_thread(void)
>   	fstate_off(current, task_pt_regs(current));
>   	memset(&current->thread.fstate, 0, sizeof(current->thread.fstate));
>   #endif
> +#ifdef CONFIG_VECTOR
> +	/* Reset vector state */
> +	vstate_off(current, task_pt_regs(current));
> +	memset(&current->thread.vstate, 0, sizeof(current->thread.vstate));
> +#endif
>   }
>   
>   int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
>   {
>   	fstate_save(src, task_pt_regs(src));
> +	if (has_vector)
> +		/* To make sure every dirty vector context is saved. */
> +		vstate_save(src, task_pt_regs(src));
>   	*dst = *src;
> +	if (has_vector) {
> +		/* Copy vector context to the forked task from parent. */
> +		if ((task_pt_regs(src)->status & SR_VS) != SR_VS_OFF) {
> +			dst->thread.vstate.datap = kzalloc(riscv_vsize, GFP_KERNEL);
> +			/* Failed to allocate memory. */
> +			if (!dst->thread.vstate.datap)
> +				return -ENOMEM;
> +			/* Copy the src vector context to dst. */
> +			memcpy(dst->thread.vstate.datap,
> +			       src->thread.vstate.datap, riscv_vsize);
> +		}
> +	}
> +
>   	return 0;
>   }
>   
> +void arch_release_task_struct(struct task_struct *tsk)
> +{
> +	/* Free the vector context of datap. */
> +	if (has_vector)
> +		kfree(tsk->thread.vstate.datap);
> +}
> +
>   int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
>   		struct task_struct *p, unsigned long tls)
>   {
> diff --git a/arch/riscv/kernel/vector.S b/arch/riscv/kernel/vector.S
> new file mode 100644
> index 000000000000..4c880b1c32aa
> --- /dev/null
> +++ b/arch/riscv/kernel/vector.S
> @@ -0,0 +1,84 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2012 Regents of the University of California
> + * Copyright (C) 2017 SiFive
> + * Copyright (C) 2019 Alibaba Group Holding Limited
> + *
> + *   This program is free software; you can redistribute it and/or
> + *   modify it under the terms of the GNU General Public License
> + *   as published by the Free Software Foundation, version 2.
> + *
> + *   This program is distributed in the hope that it will be useful,
> + *   but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *   GNU General Public License for more details.
> + */
> +
> +#include <linux/linkage.h>
> +
> +#include <asm/asm.h>
> +#include <asm/csr.h>
> +#include <asm/asm-offsets.h>
> +
> +#define vstatep  a0
> +#define datap    a1
> +#define x_vstart t0
> +#define x_vtype  t1
> +#define x_vl     t2
> +#define x_vcsr   t3
> +#define incr     t4
> +#define m_one    t5
> +#define status   t6
> +
> +ENTRY(__vstate_save)
> +	li      status, SR_VS
> +	csrs    sstatus, status
> +
> +	csrr    x_vstart, CSR_VSTART
> +	csrr    x_vtype, CSR_VTYPE
> +	csrr    x_vl, CSR_VL
> +	csrr    x_vcsr, CSR_VCSR
> +	li      m_one, -1
> +	vsetvli incr, m_one, e8, m8
> +	vse8.v   v0, (datap)
> +	add     datap, datap, incr
> +	vse8.v   v8, (datap)
> +	add     datap, datap, incr
> +	vse8.v   v16, (datap)
> +	add     datap, datap, incr
> +	vse8.v   v24, (datap)
> +
> +	REG_S   x_vstart, RISCV_V_STATE_VSTART(vstatep)
> +	REG_S   x_vtype, RISCV_V_STATE_VTYPE(vstatep)
> +	REG_S   x_vl, RISCV_V_STATE_VL(vstatep)
> +	REG_S   x_vcsr, RISCV_V_STATE_VCSR(vstatep)
> +
> +	csrc	sstatus, status
> +	ret
> +ENDPROC(__vstate_save)
> +
> +ENTRY(__vstate_restore)
> +	li      status, SR_VS
> +	csrs    sstatus, status
> +
> +	li      m_one, -1
> +	vsetvli incr, m_one, e8, m8
> +	vle8.v   v0, (datap)
> +	add     datap, datap, incr
> +	vle8.v   v8, (datap)
> +	add     datap, datap, incr
> +	vle8.v   v16, (datap)
> +	add     datap, datap, incr
> +	vle8.v   v24, (datap)
> +
> +	REG_L   x_vstart, RISCV_V_STATE_VSTART(vstatep)
> +	REG_L   x_vtype, RISCV_V_STATE_VTYPE(vstatep)
> +	REG_L   x_vl, RISCV_V_STATE_VL(vstatep)
> +	REG_L   x_vcsr, RISCV_V_STATE_VCSR(vstatep)
> +	vsetvl  x0, x_vl, x_vtype
> +	csrw    CSR_VSTART, x_vstart
> +	csrw    CSR_VCSR, x_vcsr
> +
> +	csrc	sstatus, status
> +	ret
> +ENDPROC(__vstate_restore)
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 00/21] riscv: Add vector ISA support
  2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
                   ` (21 preceding siblings ...)
  2021-09-13  1:47 ` [RFC PATCH v8 00/21] riscv: Add vector ISA support Vincent Chen
@ 2021-09-13 17:18 ` Vineet Gupta
  22 siblings, 0 replies; 55+ messages in thread
From: Vineet Gupta @ 2021-09-13 17:18 UTC (permalink / raw)
  To: Greentime Hu, linux-riscv, linux-kernel, aou, palmer,
	paul.walmsley, vincent.chen

On 9/8/21 10:45 AM, Greentime Hu wrote:
> This patchset is implemented based on vector 1.0-rc1 spec to add vector
> support in riscv Linux kernel. To make this happen, we defined a new
> structure __riscv_v_state to save the vector related registers. It is used
> for both kernel space and user space.
> 
>   - In kernel space, the datap pointer in __riscv_v_state will be allocated
>     dynamically to save vector registers.
>   - In user space,
> 	- In signal handler of user space, datap will point to the address
>            of the __riscv_v_state data structure to save vector
>            registers in stack. We also create a __reserved[] array for
> 	  future extensions.
> 	- In ptrace, the data will be put in ubuf in which we use
>         	  riscv_vr_get()/riscv_vr_set() to get or set the
> 	  __riscv_v_state data structure from/to it, datap pointer
> 	  would be zeroed and vector registers will be copied to the
> 	  address right after the __riscv_v_state structure in ubuf.
> 
> This patchset also adds support for kernel mode vector, kernel XOR
> implementation with vector ISA and includes several bug fixes and code
> refinement.
> 
> This patchset is rebased to v5.14 and it is tested by running several
> vector programs simultaneously. It also can get the correct ucontext_t in
> signal handler and restore correct context after sigreturn. It is also
> tested with ptrace() syscall to use PTRACE_GETREGSET/PTRACE_SETREGSET to
> get/set the vector registers. I have tested vlen=128 and vlen=256 cases in
> qemu-system-riscv64 provided by Frank Chang.

Are QEMU/Spike changes available somewhere publicly for people to try this ?

Thx,
-Vineet

> 
> We have sent patches to glibc mailing list for ifunc support and sigcontext
> changes. We will send patches for vector support to glibc mailing list
> recently.
> 
>   [1] https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc
> 
> ---
> Changelog V8
>   - Rebase to v5.14
>   - Refine struct __riscv_v_state with struct __riscv_ctx_hdr
>   - Refine has_vector into a static key
>   - Defined __reserved space in struct sigcontext for vector and future extensions
> 
> Changelog V7
>   - Add support for kernel mode vector
>   - Add vector extension XOR implementation
>   - Optimize task switch codes of vector
>   - Allocate space for vector registers in start_thread()
>   - Fix an illegal instruction exception when accessing vlenb
>   - Optimize vector registers initialization
>   - Initialize vector registers with proper vsetvli then it can work normally
>   - Refine ptrace porting due to generic API changed
>   - Code clean up
> 
> Changelog V6
>   - Replace vle.v/vse.v instructions with vle8.v/vse8.v based on 0.9 spec
>   - Add comments based on mailinglist feedback
>   - Fix rv32 build error
> 
> Changelog V5
>   - Using regset_size() correctly in generic ptrace
>   - Fix the ptrace porting
>   - Fix compile warning
> 
> Changelog V4
>   - Support dynamic vlen
>   - Fix bugs: lazy save/resotre, not saving vtype
>   - Update VS bit offset based on latest vector spec
>   - Add new vector csr based on latest vector spec
>   - Code refine and removed unused macros
> 
> Changelog V3
>   - Rebase linux-5.6-rc3 and tested with qemu
>   - Seperate patches with Anup's advice
>   - Give out a ABI puzzle with unlimited vlen
> 
> Changelog V2
>   - Fixup typo "vecotr, fstate_save->vstate_save".
>   - Fixup wrong saved registers' length in vector.S.
>   - Seperate unrelated patches from this one.
> 
> Greentime Hu (15):
>    riscv: Add new csr defines related to vector extension
>    riscv: Add has_vector/riscv_vsize to save vector features.
>    riscv: Add vector struct and assembler definitions
>    riscv: Add task switch support for vector
>    riscv: Add ptrace vector support
>    riscv: Add sigcontext save/restore for vector
>    riscv: Add support for kernel mode vector
>    riscv: Use CSR_STATUS to replace sstatus in vector.S
>    riscv: Add vector extension XOR implementation
>    riscv: Initialize vector registers with proper vsetvli then it can
>      work normally
>    riscv: Optimize vector registers initialization
>    riscv: Fix an illegal instruction exception when accessing vlenb
>      without enable vector first
>    riscv: Allocate space for vector registers in start_thread()
>    riscv: Optimize task switch codes of vector
>    riscv: Turn has_vector into a static key if VECTOR=y
> 
> Guo Ren (5):
>    riscv: Separate patch for cflags and aflags
>    riscv: Rename __switch_to_aux -> fpu
>    riscv: Extending cpufeature.c to detect V-extension
>    riscv: Add vector feature to compile
>    riscv: Reset vector register
> 
> Vincent Chen (1):
>    riscv: signal: Report signal frame size to userspace via auxv
> 
>   arch/riscv/Kconfig                       |   9 ++
>   arch/riscv/Makefile                      |  19 ++-
>   arch/riscv/include/asm/csr.h             |  16 ++-
>   arch/riscv/include/asm/elf.h             |  41 +++---
>   arch/riscv/include/asm/processor.h       |   3 +
>   arch/riscv/include/asm/switch_to.h       |  71 +++++++++-
>   arch/riscv/include/asm/vector.h          |  16 +++
>   arch/riscv/include/asm/xor.h             |  74 ++++++++++
>   arch/riscv/include/uapi/asm/auxvec.h     |   1 +
>   arch/riscv/include/uapi/asm/hwcap.h      |   1 +
>   arch/riscv/include/uapi/asm/ptrace.h     |  25 ++++
>   arch/riscv/include/uapi/asm/sigcontext.h |  24 ++++
>   arch/riscv/kernel/Makefile               |   7 +
>   arch/riscv/kernel/asm-offsets.c          |   8 ++
>   arch/riscv/kernel/cpufeature.c           |  16 +++
>   arch/riscv/kernel/entry.S                |   6 +-
>   arch/riscv/kernel/head.S                 |  22 ++-
>   arch/riscv/kernel/kernel_mode_vector.c   | 158 +++++++++++++++++++++
>   arch/riscv/kernel/process.c              |  49 +++++++
>   arch/riscv/kernel/ptrace.c               |  71 ++++++++++
>   arch/riscv/kernel/setup.c                |   4 +
>   arch/riscv/kernel/signal.c               | 172 ++++++++++++++++++++++-
>   arch/riscv/kernel/vector.S               |  81 +++++++++++
>   arch/riscv/lib/Makefile                  |   1 +
>   arch/riscv/lib/xor.S                     |  81 +++++++++++
>   include/uapi/linux/elf.h                 |   1 +
>   26 files changed, 941 insertions(+), 36 deletions(-)
>   create mode 100644 arch/riscv/include/asm/vector.h
>   create mode 100644 arch/riscv/include/asm/xor.h
>   create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
>   create mode 100644 arch/riscv/kernel/vector.S
>   create mode 100644 arch/riscv/lib/xor.S
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 15/21] riscv: Add vector extension XOR implementation
  2021-09-08 17:45 ` [RFC PATCH v8 15/21] riscv: Add vector extension XOR implementation Greentime Hu
  2021-09-09  6:12   ` Christoph Hellwig
@ 2021-09-14  8:29   ` Ley Foon Tan
  2021-09-28  7:01     ` Greentime Hu
  1 sibling, 1 reply; 55+ messages in thread
From: Ley Foon Tan @ 2021-09-14  8:29 UTC (permalink / raw)
  To: Greentime Hu
  Cc: linux-riscv, Linux Kernel Mailing List, aou, Palmer Dabbelt,
	Paul Walmsley, vincent.chen

On Thu, Sep 9, 2021 at 1:49 AM Greentime Hu <greentime.hu@sifive.com> wrote:
>
> This patch adds support for vector optimized XOR it is tested in spike and
> qemu.
>
> Logs in spike:
> [    0.008365] xor: measuring software checksum speed
> [    0.048885]    8regs     :  1719.000 MB/sec
> [    0.089080]    32regs    :  1717.000 MB/sec
> [    0.129275]    rvv       :  7043.000 MB/sec
> [    0.129525] xor: using function: rvv (7043.000 MB/sec)
>
> Logs in qemu:
> [    0.098943] xor: measuring software checksum speed
> [    0.139391]    8regs     :  2911.000 MB/sec
> [    0.181079]    32regs    :  2813.000 MB/sec
> [    0.224260]    rvv       :    45.000 MB/sec
> [    0.225586] xor: using function: 8regs (2911.000 MB/sec)
>
> Co-developed-by: Han-Kuan Chen <hankuan.chen@sifive.com>
> Signed-off-by: Han-Kuan Chen <hankuan.chen@sifive.com>
> Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> ---
>  arch/riscv/include/asm/xor.h | 74 ++++++++++++++++++++++++++++++++
>  arch/riscv/lib/Makefile      |  1 +
>  arch/riscv/lib/xor.S         | 81 ++++++++++++++++++++++++++++++++++++
>  3 files changed, 156 insertions(+)
>  create mode 100644 arch/riscv/include/asm/xor.h
>  create mode 100644 arch/riscv/lib/xor.S
>
> diff --git a/arch/riscv/include/asm/xor.h b/arch/riscv/include/asm/xor.h
> new file mode 100644
> index 000000000000..60ee0224913d
> --- /dev/null
> +++ b/arch/riscv/include/asm/xor.h
> @@ -0,0 +1,74 @@


[...]

>
> +extern bool has_vector;
> +#undef XOR_TRY_TEMPLATES
> +#define XOR_TRY_TEMPLATES           \
> +       do {        \
> +               xor_speed(&xor_block_8regs);    \
> +               xor_speed(&xor_block_32regs);    \
> +               if (has_vector) { \
> +                       xor_speed(&xor_block_rvv);\
> +               } \
> +       } while (0)
> +#endif
>
bool has_vector is changed to has_vector() function now, should this
change as well?


Regards
Ley Foon

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 21/21] riscv: Turn has_vector into a static key if VECTOR=y
  2021-09-08 17:45 ` [RFC PATCH v8 21/21] riscv: Turn has_vector into a static key if VECTOR=y Greentime Hu
@ 2021-09-15 14:24   ` Jisheng Zhang
  2021-10-04 15:04     ` Greentime Hu
  0 siblings, 1 reply; 55+ messages in thread
From: Jisheng Zhang @ 2021-09-15 14:24 UTC (permalink / raw)
  To: Greentime Hu
  Cc: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

On Thu,  9 Sep 2021 01:45:33 +0800
Greentime Hu <greentime.hu@sifive.com> wrote:

> Just like fpu, we can use static key for has_vector.
> The has_vector check sits at hot code path: switch_to(). Currently,
> has_vector is a bool variable if VECTOR=y, switch_to() checks it each time,
> we can optimize out this check by turning the has_vector into a static key.
> 

has_vector is newly introduced in this patch set so I believe this patch can
be folded into has_vector introducing patch, I.E patch 6


> Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> ---
>  arch/riscv/include/asm/switch_to.h     | 10 +++++++---
>  arch/riscv/kernel/cpufeature.c         |  4 ++--
>  arch/riscv/kernel/kernel_mode_vector.c |  4 ++--
>  arch/riscv/kernel/process.c            |  8 ++++----
>  arch/riscv/kernel/signal.c             |  6 +++---
>  5 files changed, 18 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
> index b48c9c974564..576204217e0f 100644
> --- a/arch/riscv/include/asm/switch_to.h
> +++ b/arch/riscv/include/asm/switch_to.h
> @@ -71,7 +71,11 @@ static __always_inline bool has_fpu(void) { return false; }
>  #endif
>  
>  #ifdef CONFIG_VECTOR
> -extern bool has_vector;
> +extern struct static_key_false cpu_hwcap_vector;
> +static __always_inline bool has_vector(void)
> +{
> +	return static_branch_likely(&cpu_hwcap_vector);
> +}
>  extern unsigned long riscv_vsize;
>  extern void __vstate_save(struct __riscv_v_state *save_to, void *datap);
>  extern void __vstate_restore(struct __riscv_v_state *restore_from, void *datap);
> @@ -120,7 +124,7 @@ static inline void __switch_to_vector(struct task_struct *prev,
>  }
>  
>  #else
> -#define has_vector false
> +static __always_inline bool has_vector(void) { return false; }
>  #define riscv_vsize (0)
>  #define vstate_save(task, regs) do { } while (0)
>  #define vstate_restore(task, regs) do { } while (0)
> @@ -136,7 +140,7 @@ do {							\
>  	struct task_struct *__next = (next);		\
>  	if (has_fpu())					\
>  		__switch_to_fpu(__prev, __next);	\
> -	if (has_vector)					\
> +	if (has_vector())					\
>  		__switch_to_vector(__prev, __next);	\
>  	((last) = __switch_to(__prev, __next));		\
>  } while (0)
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index af984f875f60..0139ec20adce 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -23,7 +23,7 @@ __ro_after_init DEFINE_STATIC_KEY_FALSE(cpu_hwcap_fpu);
>  #endif
>  #ifdef CONFIG_VECTOR
>  #include <asm/vector.h>
> -bool has_vector __read_mostly;
> +__ro_after_init DEFINE_STATIC_KEY_FALSE(cpu_hwcap_vector);
>  unsigned long riscv_vsize __read_mostly;
>  #endif
>  
> @@ -157,7 +157,7 @@ void __init riscv_fill_hwcap(void)
>  
>  #ifdef CONFIG_VECTOR
>  	if (elf_hwcap & COMPAT_HWCAP_ISA_V) {
> -		has_vector = true;
> +		static_branch_enable(&cpu_hwcap_vector);
>  		/* There are 32 vector registers with vlenb length. */
>  		rvv_enable();
>  		riscv_vsize = csr_read(CSR_VLENB) * 32;
> diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
> index 0d990bd8b8dd..0d08954c30af 100644
> --- a/arch/riscv/kernel/kernel_mode_vector.c
> +++ b/arch/riscv/kernel/kernel_mode_vector.c
> @@ -110,7 +110,7 @@ static void vector_flush_cpu_state(void)
>   */
>  void kernel_rvv_begin(void)
>  {
> -	if (WARN_ON(!has_vector))
> +	if (WARN_ON(!has_vector()))
>  		return;
>  
>  	WARN_ON(!may_use_vector());
> @@ -140,7 +140,7 @@ EXPORT_SYMBOL(kernel_rvv_begin);
>   */
>  void kernel_rvv_end(void)
>  {
> -	if (WARN_ON(!has_vector))
> +	if (WARN_ON(!has_vector()))
>  		return;
>  
>  	/* Invalidate vector regs */
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 05ff5f934e7e..62540815ba1c 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -96,7 +96,7 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
>  		fstate_restore(current, regs);
>  	}
>  
> -	if (has_vector) {
> +	if (has_vector()) {
>  		struct __riscv_v_state *vstate = &(current->thread.vstate);
>  
>  		/* Enable vector and allocate memory for vector registers. */
> @@ -141,11 +141,11 @@ void flush_thread(void)
>  int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
>  {
>  	fstate_save(src, task_pt_regs(src));
> -	if (has_vector)
> +	if (has_vector())
>  		/* To make sure every dirty vector context is saved. */
>  		vstate_save(src, task_pt_regs(src));
>  	*dst = *src;
> -	if (has_vector) {
> +	if (has_vector()) {
>  		/* Copy vector context to the forked task from parent. */
>  		if ((task_pt_regs(src)->status & SR_VS) != SR_VS_OFF) {
>  			dst->thread.vstate.datap = kzalloc(riscv_vsize, GFP_KERNEL);
> @@ -164,7 +164,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
>  void arch_release_task_struct(struct task_struct *tsk)
>  {
>  	/* Free the vector context of datap. */
> -	if (has_vector)
> +	if (has_vector())
>  		kfree(tsk->thread.vstate.datap);
>  }
>  
> diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
> index d30a3b588156..6a19b4b7b206 100644
> --- a/arch/riscv/kernel/signal.c
> +++ b/arch/riscv/kernel/signal.c
> @@ -192,7 +192,7 @@ static long restore_sigcontext(struct pt_regs *regs,
>  				goto invalid;
>  			goto done;
>  		case RVV_MAGIC:
> -			if (!has_vector)
> +			if (!has_vector())
>  				goto invalid;
>  			if (size != rvv_sc_size)
>  				goto invalid;
> @@ -221,7 +221,7 @@ static size_t cal_rt_frame_size(void)
>  
>  	frame_size = sizeof(*frame);
>  
> -	if (has_vector)
> +	if (has_vector())
>  		total_context_size += rvv_sc_size;
>  	/* Preserved a __riscv_ctx_hdr for END signal context header. */
>  	total_context_size += sizeof(struct __riscv_ctx_hdr);
> @@ -288,7 +288,7 @@ static long setup_sigcontext(struct rt_sigframe __user *frame,
>  	if (has_fpu())
>  		err |= save_fp_state(regs, &sc->sc_fpregs);
>  	/* Save the vector state. */
> -	if (has_vector)
> +	if (has_vector())
>  		err |= save_v_state(regs, &sc_reserved_free_ptr);
>  
>  	/* Put END __riscv_ctx_hdr at the end. */



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 20/21] riscv: Optimize task switch codes of vector
  2021-09-08 17:45 ` [RFC PATCH v8 20/21] riscv: Optimize task switch codes of vector Greentime Hu
@ 2021-09-15 14:29   ` Jisheng Zhang
  2021-10-04 14:13     ` Greentime Hu
  0 siblings, 1 reply; 55+ messages in thread
From: Jisheng Zhang @ 2021-09-15 14:29 UTC (permalink / raw)
  To: Greentime Hu
  Cc: linux-riscv, linux-kernel, aou, palmer, paul.walmsley, vincent.chen

On Thu,  9 Sep 2021 01:45:32 +0800
Greentime Hu <greentime.hu@sifive.com> wrote:

> This patch replacees 2 instructions with 1 instruction to do the same thing
> . rs1=x0 with rd != x0 is a special form of the instruction that sets vl to
> MAXVL.

Similarly, the vector.S is newly introduced in this patch set, so could
this optimization be folded into the __vstate_save and __vstate_restore
introduction patch? Or it's better to keep this optimizaion in commit log?

> 
> Suggested-by: Andrew Waterman <andrew@sifive.com>
> Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
> Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> ---
>  arch/riscv/kernel/vector.S | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/riscv/kernel/vector.S b/arch/riscv/kernel/vector.S
> index 4f0c5a166e4e..f7223c81b11a 100644
> --- a/arch/riscv/kernel/vector.S
> +++ b/arch/riscv/kernel/vector.S
> @@ -27,8 +27,7 @@
>  #define x_vl     t2
>  #define x_vcsr   t3
>  #define incr     t4
> -#define m_one    t5
> -#define status   t6
> +#define status   t5
>  
>  ENTRY(__vstate_save)
>  	li      status, SR_VS
> @@ -38,8 +37,7 @@ ENTRY(__vstate_save)
>  	csrr    x_vtype, CSR_VTYPE
>  	csrr    x_vl, CSR_VL
>  	csrr    x_vcsr, CSR_VCSR
> -	li      m_one, -1
> -	vsetvli incr, m_one, e8, m8
> +	vsetvli incr, x0, e8, m8
>  	vse8.v   v0, (datap)
>  	add     datap, datap, incr
>  	vse8.v   v8, (datap)
> @@ -61,8 +59,7 @@ ENTRY(__vstate_restore)
>  	li      status, SR_VS
>  	csrs    CSR_STATUS, status
>  
> -	li      m_one, -1
> -	vsetvli incr, m_one, e8, m8
> +	vsetvli incr, x0, e8, m8
>  	vle8.v   v0, (datap)
>  	add     datap, datap, incr
>  	vle8.v   v8, (datap)



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 15/21] riscv: Add vector extension XOR implementation
  2021-09-09  6:12   ` Christoph Hellwig
@ 2021-09-28  7:00     ` Greentime Hu
  0 siblings, 0 replies; 55+ messages in thread
From: Greentime Hu @ 2021-09-28  7:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-riscv, Linux Kernel Mailing List, Albert Ou,
	Palmer Dabbelt, Paul Walmsley, Vincent Chen

Christoph Hellwig <hch@infradead.org> 於 2021年9月9日 週四 下午2:12寫道:
>
> On Thu, Sep 09, 2021 at 01:45:27AM +0800, Greentime Hu wrote:
> > +extern void xor_regs_2_(unsigned long bytes, unsigned long *p1,
> > +                     unsigned long *p2);
> > +extern void xor_regs_3_(unsigned long bytes, unsigned long *p1,
> > +                     unsigned long *p2, unsigned long *p3);
> > +extern void xor_regs_4_(unsigned long bytes, unsigned long *p1,
> > +                     unsigned long *p2, unsigned long *p3,
> > +                     unsigned long *p4);
> > +extern void xor_regs_5_(unsigned long bytes, unsigned long *p1,
> > +                     unsigned long *p2, unsigned long *p3, unsigned long *p4,
> > +                     unsigned long *p5);
>
> There is no need for externs on function declarations ever.
>
Ok, I'll remove it.

> > +static void xor_rvv_2(unsigned long bytes, unsigned long *p1, unsigned long *p2)
> > +{
> > +     kernel_rvv_begin();
> > +     xor_regs_2_(bytes, p1, p2);
> > +     kernel_rvv_end();
> > +}
>
> This looks strange.  Why these wrappers?

We don't use rvv in kernel space generally. If we want to use it, we
need to save all vector registers first.
Just like arm64/x86 implementation in
arch/arm64/include/asm/xor.h
arch/x86/include/asm/xor.h

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 15/21] riscv: Add vector extension XOR implementation
  2021-09-14  8:29   ` Ley Foon Tan
@ 2021-09-28  7:01     ` Greentime Hu
  0 siblings, 0 replies; 55+ messages in thread
From: Greentime Hu @ 2021-09-28  7:01 UTC (permalink / raw)
  To: Ley Foon Tan
  Cc: linux-riscv, Linux Kernel Mailing List, Albert Ou,
	Palmer Dabbelt, Paul Walmsley, Vincent Chen

Ley Foon Tan <lftan.linux@gmail.com> 於 2021年9月14日 週二 下午4:30寫道:
>
> On Thu, Sep 9, 2021 at 1:49 AM Greentime Hu <greentime.hu@sifive.com> wrote:
> >
> > This patch adds support for vector optimized XOR it is tested in spike and
> > qemu.
> >
> > Logs in spike:
> > [    0.008365] xor: measuring software checksum speed
> > [    0.048885]    8regs     :  1719.000 MB/sec
> > [    0.089080]    32regs    :  1717.000 MB/sec
> > [    0.129275]    rvv       :  7043.000 MB/sec
> > [    0.129525] xor: using function: rvv (7043.000 MB/sec)
> >
> > Logs in qemu:
> > [    0.098943] xor: measuring software checksum speed
> > [    0.139391]    8regs     :  2911.000 MB/sec
> > [    0.181079]    32regs    :  2813.000 MB/sec
> > [    0.224260]    rvv       :    45.000 MB/sec
> > [    0.225586] xor: using function: 8regs (2911.000 MB/sec)
> >
> > Co-developed-by: Han-Kuan Chen <hankuan.chen@sifive.com>
> > Signed-off-by: Han-Kuan Chen <hankuan.chen@sifive.com>
> > Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> > ---
> >  arch/riscv/include/asm/xor.h | 74 ++++++++++++++++++++++++++++++++
> >  arch/riscv/lib/Makefile      |  1 +
> >  arch/riscv/lib/xor.S         | 81 ++++++++++++++++++++++++++++++++++++
> >  3 files changed, 156 insertions(+)
> >  create mode 100644 arch/riscv/include/asm/xor.h
> >  create mode 100644 arch/riscv/lib/xor.S
> >
> > diff --git a/arch/riscv/include/asm/xor.h b/arch/riscv/include/asm/xor.h
> > new file mode 100644
> > index 000000000000..60ee0224913d
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/xor.h
> > @@ -0,0 +1,74 @@
>
>
> [...]
>
> >
> > +extern bool has_vector;
> > +#undef XOR_TRY_TEMPLATES
> > +#define XOR_TRY_TEMPLATES           \
> > +       do {        \
> > +               xor_speed(&xor_block_8regs);    \
> > +               xor_speed(&xor_block_32regs);    \
> > +               if (has_vector) { \
> > +                       xor_speed(&xor_block_rvv);\
> > +               } \
> > +       } while (0)
> > +#endif
> >
> bool has_vector is changed to has_vector() function now, should this
> change as well?

That's right. Thank you, LeyFoon.
I'll merge the patch to fix the has_vector() issue in next version patchset.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-09-13 12:21   ` Darius Rad
@ 2021-09-28 14:56     ` Greentime Hu
  2021-09-29 13:28       ` Darius Rad
  0 siblings, 1 reply; 55+ messages in thread
From: Greentime Hu @ 2021-09-28 14:56 UTC (permalink / raw)
  To: Darius Rad
  Cc: linux-riscv, Linux Kernel Mailing List, Albert Ou,
	Palmer Dabbelt, Paul Walmsley, Vincent Chen

Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
>
> On 9/8/21 1:45 PM, Greentime Hu wrote:
> > This patch adds task switch support for vector. It supports partial lazy
> > save and restore mechanism. It also supports all lengths of vlen.
> >
> > [guoren@linux.alibaba.com: First available porting to support vector
> > context switching]
> > [nick.knight@sifive.com: Rewrite vector.S to support dynamic vlen, xlen and
> > code refine]
> > [vincent.chen@sifive.co: Fix the might_sleep issue in vstate_save,
> > vstate_restore]
> > Co-developed-by: Nick Knight <nick.knight@sifive.com>
> > Signed-off-by: Nick Knight <nick.knight@sifive.com>
> > Co-developed-by: Guo Ren <guoren@linux.alibaba.com>
> > Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> > Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
> > Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> > Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> > ---
> >   arch/riscv/include/asm/switch_to.h | 66 +++++++++++++++++++++++
> >   arch/riscv/kernel/Makefile         |  1 +
> >   arch/riscv/kernel/process.c        | 38 ++++++++++++++
> >   arch/riscv/kernel/vector.S         | 84 ++++++++++++++++++++++++++++++
> >   4 files changed, 189 insertions(+)
> >   create mode 100644 arch/riscv/kernel/vector.S
> >
> > diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
> > index ec83770b3d98..de0573dad78f 100644
> > --- a/arch/riscv/include/asm/switch_to.h
> > +++ b/arch/riscv/include/asm/switch_to.h
> > @@ -7,10 +7,12 @@
> >   #define _ASM_RISCV_SWITCH_TO_H
> >
> >   #include <linux/jump_label.h>
> > +#include <linux/slab.h>
> >   #include <linux/sched/task_stack.h>
> >   #include <asm/processor.h>
> >   #include <asm/ptrace.h>
> >   #include <asm/csr.h>
> > +#include <asm/asm-offsets.h>
> >
> >   #ifdef CONFIG_FPU
> >   extern void __fstate_save(struct task_struct *save_to);
> > @@ -68,6 +70,68 @@ static __always_inline bool has_fpu(void) { return false; }
> >   #define __switch_to_fpu(__prev, __next) do { } while (0)
> >   #endif
> >
> > +#ifdef CONFIG_VECTOR
> > +extern bool has_vector;
> > +extern unsigned long riscv_vsize;
> > +extern void __vstate_save(struct __riscv_v_state *save_to, void *datap);
> > +extern void __vstate_restore(struct __riscv_v_state *restore_from, void *datap);
> > +
> > +static inline void __vstate_clean(struct pt_regs *regs)
> > +{
> > +     regs->status = (regs->status & ~(SR_VS)) | SR_VS_CLEAN;
> > +}
> > +
> > +static inline void vstate_off(struct task_struct *task,
> > +                           struct pt_regs *regs)
> > +{
> > +     regs->status = (regs->status & ~SR_VS) | SR_VS_OFF;
> > +}
> > +
> > +static inline void vstate_save(struct task_struct *task,
> > +                            struct pt_regs *regs)
> > +{
> > +     if ((regs->status & SR_VS) == SR_VS_DIRTY) {
> > +             struct __riscv_v_state *vstate = &(task->thread.vstate);
> > +
> > +             __vstate_save(vstate, vstate->datap);
> > +             __vstate_clean(regs);
> > +     }
> > +}
> > +
> > +static inline void vstate_restore(struct task_struct *task,
> > +                               struct pt_regs *regs)
> > +{
> > +     if ((regs->status & SR_VS) != SR_VS_OFF) {
> > +             struct __riscv_v_state *vstate = &(task->thread.vstate);
> > +
> > +             /* Allocate space for vector registers. */
> > +             if (!vstate->datap) {
> > +                     vstate->datap = kzalloc(riscv_vsize, GFP_ATOMIC);
> > +                     vstate->size = riscv_vsize;
> > +             }
> > +             __vstate_restore(vstate, vstate->datap);
> > +             __vstate_clean(regs);
> > +     }
> > +}
> > +
> > +static inline void __switch_to_vector(struct task_struct *prev,
> > +                                struct task_struct *next)
> > +{
> > +     struct pt_regs *regs;
> > +
> > +     regs = task_pt_regs(prev);
> > +     if (unlikely(regs->status & SR_SD))
> > +             vstate_save(prev, regs);
> > +     vstate_restore(next, task_pt_regs(next));
> > +}
> > +
> > +#else
> > +#define has_vector false
> > +#define vstate_save(task, regs) do { } while (0)
> > +#define vstate_restore(task, regs) do { } while (0)
> > +#define __switch_to_vector(__prev, __next) do { } while (0)
> > +#endif
> > +
> >   extern struct task_struct *__switch_to(struct task_struct *,
> >                                      struct task_struct *);
> >
> > @@ -77,6 +141,8 @@ do {                                                       \
> >       struct task_struct *__next = (next);            \
> >       if (has_fpu())                                  \
> >               __switch_to_fpu(__prev, __next);        \
> > +     if (has_vector)                                 \
> > +             __switch_to_vector(__prev, __next);     \
> >       ((last) = __switch_to(__prev, __next));         \
> >   } while (0)
> >
> > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > index 3397ddac1a30..344078080839 100644
> > --- a/arch/riscv/kernel/Makefile
> > +++ b/arch/riscv/kernel/Makefile
> > @@ -40,6 +40,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
> >
> >   obj-$(CONFIG_RISCV_M_MODE)  += traps_misaligned.o
> >   obj-$(CONFIG_FPU)           += fpu.o
> > +obj-$(CONFIG_VECTOR)         += vector.o
> >   obj-$(CONFIG_SMP)           += smpboot.o
> >   obj-$(CONFIG_SMP)           += smp.o
> >   obj-$(CONFIG_SMP)           += cpu_ops.o
> > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> > index 03ac3aa611f5..0b86e9e531c9 100644
> > --- a/arch/riscv/kernel/process.c
> > +++ b/arch/riscv/kernel/process.c
> > @@ -95,6 +95,16 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
> >                */
> >               fstate_restore(current, regs);
> >       }
> > +
> > +     if (has_vector) {
> > +             regs->status |= SR_VS_INITIAL;
> > +             /*
> > +              * Restore the initial value to the vector register
> > +              * before starting the user program.
> > +              */
> > +             vstate_restore(current, regs);
> > +     }
> > +
>
> So this will unconditionally enable vector instructions, and allocate
> memory for vector state, for all processes, regardless of whether vector
> instructions are used?
>

Hi Darius,

Yes, it will enable vector if has_vector() is true. The reason that we
choose to enable and allocate memory for user space program is because
we also implement some common functions in the glibc such as memcpy
vector version and it is called very often by every process. So that
we assume if the user program is running in a CPU with vector ISA
would like to use vector by default. If we disable it by default and
make it trigger the illegal instruction, that might be a burden since
almost every process will use vector glibc memcpy or something like
that.

> Given the size of the vector state and potential power and performance
> implications of enabling the vector engine, it seems like this should
> treated similarly to Intel AMX on x86.  The full discussion of that is
> here:
>
> https://lore.kernel.org/lkml/CALCETrW2QHa2TLvnUuVxAAheqcbSZ-5_WRXtDSAGcbG8N+gtdQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/
>
> The cover letter for recent Intel AMX patches has a summary of the x86
> implementation:
>
> https://lore.kernel.org/lkml/20210825155413.19673-1-chang.seok.bae@intel.com/
>
> If RISC-V were to adopt a similar approach, I think the significant
> points are:
>
>   1. A process (or thread) must specifically request the desire to use
> vector extensions (perhaps with some new arch_prctl() API),
>
>   2. The kernel is free to deny permission, perhaps based on
> administrative rules or for other reasons, and
>
>   3. If a process attempts to use vector extensions before doing the
> above, the process will die due to an illegal instruction.

Thank you for sharing this, but I am not sure if we should treat
vector like AMX on x86. IMHO, compiler might generate code with vector
instructions automatically someday, maybe we should treat vector
extensions like other extensions.
If user knows the vector extension is supported in this CPU and he
would like to use it, it seems we should let user use it directly just
like other extensions.
If user don't know it exists or not, user should use the library API
transparently and let glibc or other library deal with it. The glibc
ifunc feature or multi-lib should be able to choose the correct
implementation.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-09-28 14:56     ` Greentime Hu
@ 2021-09-29 13:28       ` Darius Rad
  2021-10-01  2:46         ` Ley Foon Tan
  2021-10-04 12:36         ` Greentime Hu
  0 siblings, 2 replies; 55+ messages in thread
From: Darius Rad @ 2021-09-29 13:28 UTC (permalink / raw)
  To: Greentime Hu
  Cc: linux-riscv, Linux Kernel Mailing List, Albert Ou,
	Palmer Dabbelt, Paul Walmsley, Vincent Chen

On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> >
> > On 9/8/21 1:45 PM, Greentime Hu wrote:
> > > This patch adds task switch support for vector. It supports partial lazy
> > > save and restore mechanism. It also supports all lengths of vlen.
> > >
> > > [guoren@linux.alibaba.com: First available porting to support vector
> > > context switching]
> > > [nick.knight@sifive.com: Rewrite vector.S to support dynamic vlen, xlen and
> > > code refine]
> > > [vincent.chen@sifive.co: Fix the might_sleep issue in vstate_save,
> > > vstate_restore]
> > > Co-developed-by: Nick Knight <nick.knight@sifive.com>
> > > Signed-off-by: Nick Knight <nick.knight@sifive.com>
> > > Co-developed-by: Guo Ren <guoren@linux.alibaba.com>
> > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> > > Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
> > > Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> > > Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> > > ---
> > >   arch/riscv/include/asm/switch_to.h | 66 +++++++++++++++++++++++
> > >   arch/riscv/kernel/Makefile         |  1 +
> > >   arch/riscv/kernel/process.c        | 38 ++++++++++++++
> > >   arch/riscv/kernel/vector.S         | 84 ++++++++++++++++++++++++++++++
> > >   4 files changed, 189 insertions(+)
> > >   create mode 100644 arch/riscv/kernel/vector.S
> > >
> > > diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
> > > index ec83770b3d98..de0573dad78f 100644
> > > --- a/arch/riscv/include/asm/switch_to.h
> > > +++ b/arch/riscv/include/asm/switch_to.h
> > > @@ -7,10 +7,12 @@
> > >   #define _ASM_RISCV_SWITCH_TO_H
> > >
> > >   #include <linux/jump_label.h>
> > > +#include <linux/slab.h>
> > >   #include <linux/sched/task_stack.h>
> > >   #include <asm/processor.h>
> > >   #include <asm/ptrace.h>
> > >   #include <asm/csr.h>
> > > +#include <asm/asm-offsets.h>
> > >
> > >   #ifdef CONFIG_FPU
> > >   extern void __fstate_save(struct task_struct *save_to);
> > > @@ -68,6 +70,68 @@ static __always_inline bool has_fpu(void) { return false; }
> > >   #define __switch_to_fpu(__prev, __next) do { } while (0)
> > >   #endif
> > >
> > > +#ifdef CONFIG_VECTOR
> > > +extern bool has_vector;
> > > +extern unsigned long riscv_vsize;
> > > +extern void __vstate_save(struct __riscv_v_state *save_to, void *datap);
> > > +extern void __vstate_restore(struct __riscv_v_state *restore_from, void *datap);
> > > +
> > > +static inline void __vstate_clean(struct pt_regs *regs)
> > > +{
> > > +     regs->status = (regs->status & ~(SR_VS)) | SR_VS_CLEAN;
> > > +}
> > > +
> > > +static inline void vstate_off(struct task_struct *task,
> > > +                           struct pt_regs *regs)
> > > +{
> > > +     regs->status = (regs->status & ~SR_VS) | SR_VS_OFF;
> > > +}
> > > +
> > > +static inline void vstate_save(struct task_struct *task,
> > > +                            struct pt_regs *regs)
> > > +{
> > > +     if ((regs->status & SR_VS) == SR_VS_DIRTY) {
> > > +             struct __riscv_v_state *vstate = &(task->thread.vstate);
> > > +
> > > +             __vstate_save(vstate, vstate->datap);
> > > +             __vstate_clean(regs);
> > > +     }
> > > +}
> > > +
> > > +static inline void vstate_restore(struct task_struct *task,
> > > +                               struct pt_regs *regs)
> > > +{
> > > +     if ((regs->status & SR_VS) != SR_VS_OFF) {
> > > +             struct __riscv_v_state *vstate = &(task->thread.vstate);
> > > +
> > > +             /* Allocate space for vector registers. */
> > > +             if (!vstate->datap) {
> > > +                     vstate->datap = kzalloc(riscv_vsize, GFP_ATOMIC);
> > > +                     vstate->size = riscv_vsize;
> > > +             }
> > > +             __vstate_restore(vstate, vstate->datap);
> > > +             __vstate_clean(regs);
> > > +     }
> > > +}
> > > +
> > > +static inline void __switch_to_vector(struct task_struct *prev,
> > > +                                struct task_struct *next)
> > > +{
> > > +     struct pt_regs *regs;
> > > +
> > > +     regs = task_pt_regs(prev);
> > > +     if (unlikely(regs->status & SR_SD))
> > > +             vstate_save(prev, regs);
> > > +     vstate_restore(next, task_pt_regs(next));
> > > +}
> > > +
> > > +#else
> > > +#define has_vector false
> > > +#define vstate_save(task, regs) do { } while (0)
> > > +#define vstate_restore(task, regs) do { } while (0)
> > > +#define __switch_to_vector(__prev, __next) do { } while (0)
> > > +#endif
> > > +
> > >   extern struct task_struct *__switch_to(struct task_struct *,
> > >                                      struct task_struct *);
> > >
> > > @@ -77,6 +141,8 @@ do {                                                       \
> > >       struct task_struct *__next = (next);            \
> > >       if (has_fpu())                                  \
> > >               __switch_to_fpu(__prev, __next);        \
> > > +     if (has_vector)                                 \
> > > +             __switch_to_vector(__prev, __next);     \
> > >       ((last) = __switch_to(__prev, __next));         \
> > >   } while (0)
> > >
> > > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > > index 3397ddac1a30..344078080839 100644
> > > --- a/arch/riscv/kernel/Makefile
> > > +++ b/arch/riscv/kernel/Makefile
> > > @@ -40,6 +40,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
> > >
> > >   obj-$(CONFIG_RISCV_M_MODE)  += traps_misaligned.o
> > >   obj-$(CONFIG_FPU)           += fpu.o
> > > +obj-$(CONFIG_VECTOR)         += vector.o
> > >   obj-$(CONFIG_SMP)           += smpboot.o
> > >   obj-$(CONFIG_SMP)           += smp.o
> > >   obj-$(CONFIG_SMP)           += cpu_ops.o
> > > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> > > index 03ac3aa611f5..0b86e9e531c9 100644
> > > --- a/arch/riscv/kernel/process.c
> > > +++ b/arch/riscv/kernel/process.c
> > > @@ -95,6 +95,16 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
> > >                */
> > >               fstate_restore(current, regs);
> > >       }
> > > +
> > > +     if (has_vector) {
> > > +             regs->status |= SR_VS_INITIAL;
> > > +             /*
> > > +              * Restore the initial value to the vector register
> > > +              * before starting the user program.
> > > +              */
> > > +             vstate_restore(current, regs);
> > > +     }
> > > +
> >
> > So this will unconditionally enable vector instructions, and allocate
> > memory for vector state, for all processes, regardless of whether vector
> > instructions are used?
> >
> 
> Hi Darius,
> 
> Yes, it will enable vector if has_vector() is true. The reason that we
> choose to enable and allocate memory for user space program is because
> we also implement some common functions in the glibc such as memcpy
> vector version and it is called very often by every process. So that
> we assume if the user program is running in a CPU with vector ISA
> would like to use vector by default. If we disable it by default and
> make it trigger the illegal instruction, that might be a burden since
> almost every process will use vector glibc memcpy or something like
> that.

Do you have any evidence to support the assertion that almost every process
would use vector operations?  One could easily argue that the converse is
true: no existing software uses the vector extension now, so most likely a
process will not be using it.

> 
> > Given the size of the vector state and potential power and performance
> > implications of enabling the vector engine, it seems like this should
> > treated similarly to Intel AMX on x86.  The full discussion of that is
> > here:
> >
> > https://lore.kernel.org/lkml/CALCETrW2QHa2TLvnUuVxAAheqcbSZ-5_WRXtDSAGcbG8N+gtdQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/
> >
> > The cover letter for recent Intel AMX patches has a summary of the x86
> > implementation:
> >
> > https://lore.kernel.org/lkml/20210825155413.19673-1-chang.seok.bae@intel.com/
> >
> > If RISC-V were to adopt a similar approach, I think the significant
> > points are:
> >
> >   1. A process (or thread) must specifically request the desire to use
> > vector extensions (perhaps with some new arch_prctl() API),
> >
> >   2. The kernel is free to deny permission, perhaps based on
> > administrative rules or for other reasons, and
> >
> >   3. If a process attempts to use vector extensions before doing the
> > above, the process will die due to an illegal instruction.
> 
> Thank you for sharing this, but I am not sure if we should treat
> vector like AMX on x86. IMHO, compiler might generate code with vector
> instructions automatically someday, maybe we should treat vector
> extensions like other extensions.
> If user knows the vector extension is supported in this CPU and he
> would like to use it, it seems we should let user use it directly just
> like other extensions.
> If user don't know it exists or not, user should use the library API
> transparently and let glibc or other library deal with it. The glibc
> ifunc feature or multi-lib should be able to choose the correct
> implementation.

What makes me think that the vector extension should be treated like AMX is
that they both (1) have a significant amount of architectural state, and
(2) likely have a significant power and/or area impact on (non-emulated)
designs.

For example, I think it is possible, maybe even likely, that vector
implementations will have one or more of the following behaviors:

  1. A single vector unit shared among two or more harts,

  2. Additional power consumption when the vector unit is enabled and idle
versus not being enabled at all,

  3. For a system which supports variable operating frequency, a reduction
in the maximum frequency when the vector unit is enabled, and/or

  4. The inability to enter low power states and/or delays to low power
states transitions when the vector unit is enabled.

None of the above constraints apply to more ordinary extensions like
compressed or the various bit manipulation extensions.

The discussion I linked to has some well reasoned arguments on why
substantial extensions should have a mechanism to request using them by
user space.  The discussion was in the context of Intel AMX, but applies to
further x86 extensions, and I think should also apply to similar extensions
on RISC-V, like vector here.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 11/21] riscv: Add sigcontext save/restore for vector
  2021-09-08 17:45 ` [RFC PATCH v8 11/21] riscv: Add sigcontext save/restore for vector Greentime Hu
@ 2021-09-30  2:37   ` Ley Foon Tan
  0 siblings, 0 replies; 55+ messages in thread
From: Ley Foon Tan @ 2021-09-30  2:37 UTC (permalink / raw)
  To: Greentime Hu
  Cc: linux-riscv, Linux Kernel Mailing List, aou, Palmer Dabbelt,
	Paul Walmsley, vincent.chen

On Thu, Sep 9, 2021 at 1:49 AM Greentime Hu <greentime.hu@sifive.com> wrote:
>
> This patch adds sigcontext save/restore for vector. The vector registers
> will be saved in datap pointer. The datap pointer will be allocated
> dynamically when the task needs in kernel space. The datap pointer will
> be set right after the __riscv_v_state data structure to save all the
> vector registers in the signal handler stack.
>
> Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
> Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> ---
>  arch/riscv/include/uapi/asm/sigcontext.h |  24 ++++
>  arch/riscv/kernel/asm-offsets.c          |   2 +
>  arch/riscv/kernel/setup.c                |   4 +
>  arch/riscv/kernel/signal.c               | 164 ++++++++++++++++++++++-
>  4 files changed, 190 insertions(+), 4 deletions(-)
>

[....]


> +
> +static size_t cal_rt_frame_size(void)
> +{
> +       struct rt_sigframe __user *frame;
> +       static size_t frame_size;
> +       size_t total_context_size = 0;
> +       size_t sc_reserved_size = sizeof(frame->uc.uc_mcontext.__reserved);
> +
> +       if (frame_size)
> +               goto done;
> +
> +       frame_size = sizeof(*frame);
> +
> +       if (has_vector)
> +               total_context_size += rvv_sc_size;
> +       /* Preserved a __riscv_ctx_hdr for END signal context header. */
> +       total_context_size += sizeof(struct __riscv_ctx_hdr);
> +
> +       if (total_context_size > sc_reserved_size)
> +               frame_size += (total_context_size - sc_reserved_size);
> +
> +done:
> +       return round_up(frame_size, 16);

Hi Greentime,

frame_size is static variable here, so it will preserve the value for
the next calling to cal_rt_frame_size().

I think we need update frame_size before return, example:

frame_size =  round_up(frame_size, 16);
return frame_size;


Regards
Ley Foon

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-09-29 13:28       ` Darius Rad
@ 2021-10-01  2:46         ` Ley Foon Tan
  2021-10-04 12:41           ` Greentime Hu
  2021-10-04 12:36         ` Greentime Hu
  1 sibling, 1 reply; 55+ messages in thread
From: Ley Foon Tan @ 2021-10-01  2:46 UTC (permalink / raw)
  To: Darius Rad
  Cc: Greentime Hu, linux-riscv, Linux Kernel Mailing List, Albert Ou,
	Palmer Dabbelt, Paul Walmsley, Vincent Chen

On Wed, Sep 29, 2021 at 11:54 PM Darius Rad <darius@bluespec.com> wrote:
>
> On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> > >
> > > On 9/8/21 1:45 PM, Greentime Hu wrote:
> > > > This patch adds task switch support for vector. It supports partial lazy
> > > > save and restore mechanism. It also supports all lengths of vlen.
> > > >
> > > > [guoren@linux.alibaba.com: First available porting to support vector
> > > > context switching]
> > > > [nick.knight@sifive.com: Rewrite vector.S to support dynamic vlen, xlen and
> > > > code refine]
> > > > [vincent.chen@sifive.co: Fix the might_sleep issue in vstate_save,
> > > > vstate_restore]
> > > > Co-developed-by: Nick Knight <nick.knight@sifive.com>
> > > > Signed-off-by: Nick Knight <nick.knight@sifive.com>
> > > > Co-developed-by: Guo Ren <guoren@linux.alibaba.com>
> > > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> > > > Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
> > > > Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> > > > Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> > > > ---
> > > >   arch/riscv/include/asm/switch_to.h | 66 +++++++++++++++++++++++
> > > >   arch/riscv/kernel/Makefile         |  1 +
> > > >   arch/riscv/kernel/process.c        | 38 ++++++++++++++
> > > >   arch/riscv/kernel/vector.S         | 84 ++++++++++++++++++++++++++++++
> > > >   4 files changed, 189 insertions(+)
> > > >   create mode 100644 arch/riscv/kernel/vector.S
> > > >
> > > > diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
> > > > index ec83770b3d98..de0573dad78f 100644
> > > > --- a/arch/riscv/include/asm/switch_to.h
> > > > +++ b/arch/riscv/include/asm/switch_to.h
> > > > @@ -7,10 +7,12 @@
> > > >   #define _ASM_RISCV_SWITCH_TO_H
> > > >
> > > >   #include <linux/jump_label.h>
> > > > +#include <linux/slab.h>
> > > >   #include <linux/sched/task_stack.h>
> > > >   #include <asm/processor.h>
> > > >   #include <asm/ptrace.h>
> > > >   #include <asm/csr.h>
> > > > +#include <asm/asm-offsets.h>
> > > >
> > > >   #ifdef CONFIG_FPU
> > > >   extern void __fstate_save(struct task_struct *save_to);
> > > > @@ -68,6 +70,68 @@ static __always_inline bool has_fpu(void) { return false; }
> > > >   #define __switch_to_fpu(__prev, __next) do { } while (0)
> > > >   #endif
> > > >
> > > > +#ifdef CONFIG_VECTOR
> > > > +extern bool has_vector;
> > > > +extern unsigned long riscv_vsize;
> > > > +extern void __vstate_save(struct __riscv_v_state *save_to, void *datap);
> > > > +extern void __vstate_restore(struct __riscv_v_state *restore_from, void *datap);
> > > > +
> > > > +static inline void __vstate_clean(struct pt_regs *regs)
> > > > +{
> > > > +     regs->status = (regs->status & ~(SR_VS)) | SR_VS_CLEAN;
> > > > +}
> > > > +
> > > > +static inline void vstate_off(struct task_struct *task,
> > > > +                           struct pt_regs *regs)
> > > > +{
> > > > +     regs->status = (regs->status & ~SR_VS) | SR_VS_OFF;
> > > > +}
> > > > +
> > > > +static inline void vstate_save(struct task_struct *task,
> > > > +                            struct pt_regs *regs)
> > > > +{
> > > > +     if ((regs->status & SR_VS) == SR_VS_DIRTY) {
> > > > +             struct __riscv_v_state *vstate = &(task->thread.vstate);
> > > > +
> > > > +             __vstate_save(vstate, vstate->datap);
> > > > +             __vstate_clean(regs);
> > > > +     }
> > > > +}
> > > > +
> > > > +static inline void vstate_restore(struct task_struct *task,
> > > > +                               struct pt_regs *regs)
> > > > +{
> > > > +     if ((regs->status & SR_VS) != SR_VS_OFF) {
> > > > +             struct __riscv_v_state *vstate = &(task->thread.vstate);
> > > > +
> > > > +             /* Allocate space for vector registers. */
> > > > +             if (!vstate->datap) {
> > > > +                     vstate->datap = kzalloc(riscv_vsize, GFP_ATOMIC);
> > > > +                     vstate->size = riscv_vsize;
> > > > +             }
> > > > +             __vstate_restore(vstate, vstate->datap);
> > > > +             __vstate_clean(regs);
> > > > +     }
> > > > +}
> > > > +
> > > > +static inline void __switch_to_vector(struct task_struct *prev,
> > > > +                                struct task_struct *next)
> > > > +{
> > > > +     struct pt_regs *regs;
> > > > +
> > > > +     regs = task_pt_regs(prev);
> > > > +     if (unlikely(regs->status & SR_SD))
> > > > +             vstate_save(prev, regs);
> > > > +     vstate_restore(next, task_pt_regs(next));
> > > > +}
> > > > +
> > > > +#else
> > > > +#define has_vector false
> > > > +#define vstate_save(task, regs) do { } while (0)
> > > > +#define vstate_restore(task, regs) do { } while (0)
> > > > +#define __switch_to_vector(__prev, __next) do { } while (0)
> > > > +#endif
> > > > +
> > > >   extern struct task_struct *__switch_to(struct task_struct *,
> > > >                                      struct task_struct *);
> > > >
> > > > @@ -77,6 +141,8 @@ do {                                                       \
> > > >       struct task_struct *__next = (next);            \
> > > >       if (has_fpu())                                  \
> > > >               __switch_to_fpu(__prev, __next);        \
> > > > +     if (has_vector)                                 \
> > > > +             __switch_to_vector(__prev, __next);     \
> > > >       ((last) = __switch_to(__prev, __next));         \
> > > >   } while (0)
> > > >
> > > > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > > > index 3397ddac1a30..344078080839 100644
> > > > --- a/arch/riscv/kernel/Makefile
> > > > +++ b/arch/riscv/kernel/Makefile
> > > > @@ -40,6 +40,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
> > > >
> > > >   obj-$(CONFIG_RISCV_M_MODE)  += traps_misaligned.o
> > > >   obj-$(CONFIG_FPU)           += fpu.o
> > > > +obj-$(CONFIG_VECTOR)         += vector.o
> > > >   obj-$(CONFIG_SMP)           += smpboot.o
> > > >   obj-$(CONFIG_SMP)           += smp.o
> > > >   obj-$(CONFIG_SMP)           += cpu_ops.o
> > > > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> > > > index 03ac3aa611f5..0b86e9e531c9 100644
> > > > --- a/arch/riscv/kernel/process.c
> > > > +++ b/arch/riscv/kernel/process.c
> > > > @@ -95,6 +95,16 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
> > > >                */
> > > >               fstate_restore(current, regs);
> > > >       }
> > > > +
> > > > +     if (has_vector) {
> > > > +             regs->status |= SR_VS_INITIAL;
> > > > +             /*
> > > > +              * Restore the initial value to the vector register
> > > > +              * before starting the user program.
> > > > +              */
> > > > +             vstate_restore(current, regs);
> > > > +     }
> > > > +
> > >
> > > So this will unconditionally enable vector instructions, and allocate
> > > memory for vector state, for all processes, regardless of whether vector
> > > instructions are used?
> > >
> >
> > Hi Darius,
> >
> > Yes, it will enable vector if has_vector() is true. The reason that we
> > choose to enable and allocate memory for user space program is because
> > we also implement some common functions in the glibc such as memcpy
> > vector version and it is called very often by every process. So that
> > we assume if the user program is running in a CPU with vector ISA
> > would like to use vector by default. If we disable it by default and
> > make it trigger the illegal instruction, that might be a burden since
> > almost every process will use vector glibc memcpy or something like
> > that.
>
> Do you have any evidence to support the assertion that almost every process
> would use vector operations?  One could easily argue that the converse is
> true: no existing software uses the vector extension now, so most likely a
> process will not be using it.
>
> >
> > > Given the size of the vector state and potential power and performance
> > > implications of enabling the vector engine, it seems like this should
> > > treated similarly to Intel AMX on x86.  The full discussion of that is
> > > here:
> > >
> > > https://lore.kernel.org/lkml/CALCETrW2QHa2TLvnUuVxAAheqcbSZ-5_WRXtDSAGcbG8N+gtdQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/
> > >
> > > The cover letter for recent Intel AMX patches has a summary of the x86
> > > implementation:
> > >
> > > https://lore.kernel.org/lkml/20210825155413.19673-1-chang.seok.bae@intel.com/
> > >
> > > If RISC-V were to adopt a similar approach, I think the significant
> > > points are:
> > >
> > >   1. A process (or thread) must specifically request the desire to use
> > > vector extensions (perhaps with some new arch_prctl() API),
> > >
> > >   2. The kernel is free to deny permission, perhaps based on
> > > administrative rules or for other reasons, and
> > >
> > >   3. If a process attempts to use vector extensions before doing the
> > > above, the process will die due to an illegal instruction.
> >
> > Thank you for sharing this, but I am not sure if we should treat
> > vector like AMX on x86. IMHO, compiler might generate code with vector
> > instructions automatically someday, maybe we should treat vector
> > extensions like other extensions.
> > If user knows the vector extension is supported in this CPU and he
> > would like to use it, it seems we should let user use it directly just
> > like other extensions.
> > If user don't know it exists or not, user should use the library API
> > transparently and let glibc or other library deal with it. The glibc
> > ifunc feature or multi-lib should be able to choose the correct
> > implementation.
>
> What makes me think that the vector extension should be treated like AMX is
> that they both (1) have a significant amount of architectural state, and
> (2) likely have a significant power and/or area impact on (non-emulated)
> designs.
>
> For example, I think it is possible, maybe even likely, that vector
> implementations will have one or more of the following behaviors:
>
>   1. A single vector unit shared among two or more harts,
>
>   2. Additional power consumption when the vector unit is enabled and idle
> versus not being enabled at all,
>
>   3. For a system which supports variable operating frequency, a reduction
> in the maximum frequency when the vector unit is enabled, and/or
>
>   4. The inability to enter low power states and/or delays to low power
> states transitions when the vector unit is enabled.
>
> None of the above constraints apply to more ordinary extensions like
> compressed or the various bit manipulation extensions.
>
> The discussion I linked to has some well reasoned arguments on why
> substantial extensions should have a mechanism to request using them by
> user space.  The discussion was in the context of Intel AMX, but applies to
> further x86 extensions, and I think should also apply to similar extensions
> on RISC-V, like vector here.
>
There is possible use case where not all cores support vector
extension due to size, area and power.
Perhaps can have the mechanism or flow to determine the
application/thread require vector extension or it specifically request
the desire to use
vector extensions. Then this app/thread run on cpu with vector
extension capability only.

Thanks.

Regards
Ley Foon

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-09-29 13:28       ` Darius Rad
  2021-10-01  2:46         ` Ley Foon Tan
@ 2021-10-04 12:36         ` Greentime Hu
  2021-10-05 13:57           ` Darius Rad
  1 sibling, 1 reply; 55+ messages in thread
From: Greentime Hu @ 2021-10-04 12:36 UTC (permalink / raw)
  To: Darius Rad
  Cc: linux-riscv, Linux Kernel Mailing List, Albert Ou,
	Palmer Dabbelt, Paul Walmsley, Vincent Chen

Darius Rad <darius@bluespec.com> 於 2021年9月29日 週三 下午9:28寫道:
>
> On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> > >
> > > On 9/8/21 1:45 PM, Greentime Hu wrote:
> > > > This patch adds task switch support for vector. It supports partial lazy
> > > > save and restore mechanism. It also supports all lengths of vlen.
> > > >
> > > > [guoren@linux.alibaba.com: First available porting to support vector
> > > > context switching]
> > > > [nick.knight@sifive.com: Rewrite vector.S to support dynamic vlen, xlen and
> > > > code refine]
> > > > [vincent.chen@sifive.co: Fix the might_sleep issue in vstate_save,
> > > > vstate_restore]
> > > > Co-developed-by: Nick Knight <nick.knight@sifive.com>
> > > > Signed-off-by: Nick Knight <nick.knight@sifive.com>
> > > > Co-developed-by: Guo Ren <guoren@linux.alibaba.com>
> > > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> > > > Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
> > > > Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> > > > Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> > > > ---
> > > >   arch/riscv/include/asm/switch_to.h | 66 +++++++++++++++++++++++
> > > >   arch/riscv/kernel/Makefile         |  1 +
> > > >   arch/riscv/kernel/process.c        | 38 ++++++++++++++
> > > >   arch/riscv/kernel/vector.S         | 84 ++++++++++++++++++++++++++++++
> > > >   4 files changed, 189 insertions(+)
> > > >   create mode 100644 arch/riscv/kernel/vector.S
> > > >
> > > > diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
> > > > index ec83770b3d98..de0573dad78f 100644
> > > > --- a/arch/riscv/include/asm/switch_to.h
> > > > +++ b/arch/riscv/include/asm/switch_to.h
> > > > @@ -7,10 +7,12 @@
> > > >   #define _ASM_RISCV_SWITCH_TO_H
> > > >
> > > >   #include <linux/jump_label.h>
> > > > +#include <linux/slab.h>
> > > >   #include <linux/sched/task_stack.h>
> > > >   #include <asm/processor.h>
> > > >   #include <asm/ptrace.h>
> > > >   #include <asm/csr.h>
> > > > +#include <asm/asm-offsets.h>
> > > >
> > > >   #ifdef CONFIG_FPU
> > > >   extern void __fstate_save(struct task_struct *save_to);
> > > > @@ -68,6 +70,68 @@ static __always_inline bool has_fpu(void) { return false; }
> > > >   #define __switch_to_fpu(__prev, __next) do { } while (0)
> > > >   #endif
> > > >
> > > > +#ifdef CONFIG_VECTOR
> > > > +extern bool has_vector;
> > > > +extern unsigned long riscv_vsize;
> > > > +extern void __vstate_save(struct __riscv_v_state *save_to, void *datap);
> > > > +extern void __vstate_restore(struct __riscv_v_state *restore_from, void *datap);
> > > > +
> > > > +static inline void __vstate_clean(struct pt_regs *regs)
> > > > +{
> > > > +     regs->status = (regs->status & ~(SR_VS)) | SR_VS_CLEAN;
> > > > +}
> > > > +
> > > > +static inline void vstate_off(struct task_struct *task,
> > > > +                           struct pt_regs *regs)
> > > > +{
> > > > +     regs->status = (regs->status & ~SR_VS) | SR_VS_OFF;
> > > > +}
> > > > +
> > > > +static inline void vstate_save(struct task_struct *task,
> > > > +                            struct pt_regs *regs)
> > > > +{
> > > > +     if ((regs->status & SR_VS) == SR_VS_DIRTY) {
> > > > +             struct __riscv_v_state *vstate = &(task->thread.vstate);
> > > > +
> > > > +             __vstate_save(vstate, vstate->datap);
> > > > +             __vstate_clean(regs);
> > > > +     }
> > > > +}
> > > > +
> > > > +static inline void vstate_restore(struct task_struct *task,
> > > > +                               struct pt_regs *regs)
> > > > +{
> > > > +     if ((regs->status & SR_VS) != SR_VS_OFF) {
> > > > +             struct __riscv_v_state *vstate = &(task->thread.vstate);
> > > > +
> > > > +             /* Allocate space for vector registers. */
> > > > +             if (!vstate->datap) {
> > > > +                     vstate->datap = kzalloc(riscv_vsize, GFP_ATOMIC);
> > > > +                     vstate->size = riscv_vsize;
> > > > +             }
> > > > +             __vstate_restore(vstate, vstate->datap);
> > > > +             __vstate_clean(regs);
> > > > +     }
> > > > +}
> > > > +
> > > > +static inline void __switch_to_vector(struct task_struct *prev,
> > > > +                                struct task_struct *next)
> > > > +{
> > > > +     struct pt_regs *regs;
> > > > +
> > > > +     regs = task_pt_regs(prev);
> > > > +     if (unlikely(regs->status & SR_SD))
> > > > +             vstate_save(prev, regs);
> > > > +     vstate_restore(next, task_pt_regs(next));
> > > > +}
> > > > +
> > > > +#else
> > > > +#define has_vector false
> > > > +#define vstate_save(task, regs) do { } while (0)
> > > > +#define vstate_restore(task, regs) do { } while (0)
> > > > +#define __switch_to_vector(__prev, __next) do { } while (0)
> > > > +#endif
> > > > +
> > > >   extern struct task_struct *__switch_to(struct task_struct *,
> > > >                                      struct task_struct *);
> > > >
> > > > @@ -77,6 +141,8 @@ do {                                                       \
> > > >       struct task_struct *__next = (next);            \
> > > >       if (has_fpu())                                  \
> > > >               __switch_to_fpu(__prev, __next);        \
> > > > +     if (has_vector)                                 \
> > > > +             __switch_to_vector(__prev, __next);     \
> > > >       ((last) = __switch_to(__prev, __next));         \
> > > >   } while (0)
> > > >
> > > > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > > > index 3397ddac1a30..344078080839 100644
> > > > --- a/arch/riscv/kernel/Makefile
> > > > +++ b/arch/riscv/kernel/Makefile
> > > > @@ -40,6 +40,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
> > > >
> > > >   obj-$(CONFIG_RISCV_M_MODE)  += traps_misaligned.o
> > > >   obj-$(CONFIG_FPU)           += fpu.o
> > > > +obj-$(CONFIG_VECTOR)         += vector.o
> > > >   obj-$(CONFIG_SMP)           += smpboot.o
> > > >   obj-$(CONFIG_SMP)           += smp.o
> > > >   obj-$(CONFIG_SMP)           += cpu_ops.o
> > > > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> > > > index 03ac3aa611f5..0b86e9e531c9 100644
> > > > --- a/arch/riscv/kernel/process.c
> > > > +++ b/arch/riscv/kernel/process.c
> > > > @@ -95,6 +95,16 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
> > > >                */
> > > >               fstate_restore(current, regs);
> > > >       }
> > > > +
> > > > +     if (has_vector) {
> > > > +             regs->status |= SR_VS_INITIAL;
> > > > +             /*
> > > > +              * Restore the initial value to the vector register
> > > > +              * before starting the user program.
> > > > +              */
> > > > +             vstate_restore(current, regs);
> > > > +     }
> > > > +
> > >
> > > So this will unconditionally enable vector instructions, and allocate
> > > memory for vector state, for all processes, regardless of whether vector
> > > instructions are used?
> > >
> >
> > Hi Darius,
> >
> > Yes, it will enable vector if has_vector() is true. The reason that we
> > choose to enable and allocate memory for user space program is because
> > we also implement some common functions in the glibc such as memcpy
> > vector version and it is called very often by every process. So that
> > we assume if the user program is running in a CPU with vector ISA
> > would like to use vector by default. If we disable it by default and
> > make it trigger the illegal instruction, that might be a burden since
> > almost every process will use vector glibc memcpy or something like
> > that.
>
> Do you have any evidence to support the assertion that almost every process
> would use vector operations?  One could easily argue that the converse is
> true: no existing software uses the vector extension now, so most likely a
> process will not be using it.

Glibc ustreaming is just starting so you didn't see software using the
vector extension now and this patchset is testing based on those
optimized glibc too.
Vincent Chen is working on the glibc vector support upstreaming and we
will also upstream the vector version glibc memcpy, memcmp, memchr,
memmove, memset, strcmp, strlen.
Then we will see platform with vector support can use vector version
mem* and str* functions automatically based on ifunc and platform
without vector will use the original one automatically. These could be
done to select the correct optimized glibc functions by ifunc
mechanism.

>
> >
> > > Given the size of the vector state and potential power and performance
> > > implications of enabling the vector engine, it seems like this should
> > > treated similarly to Intel AMX on x86.  The full discussion of that is
> > > here:
> > >
> > > https://lore.kernel.org/lkml/CALCETrW2QHa2TLvnUuVxAAheqcbSZ-5_WRXtDSAGcbG8N+gtdQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/
> > >
> > > The cover letter for recent Intel AMX patches has a summary of the x86
> > > implementation:
> > >
> > > https://lore.kernel.org/lkml/20210825155413.19673-1-chang.seok.bae@intel.com/
> > >
> > > If RISC-V were to adopt a similar approach, I think the significant
> > > points are:
> > >
> > >   1. A process (or thread) must specifically request the desire to use
> > > vector extensions (perhaps with some new arch_prctl() API),
> > >
> > >   2. The kernel is free to deny permission, perhaps based on
> > > administrative rules or for other reasons, and
> > >
> > >   3. If a process attempts to use vector extensions before doing the
> > > above, the process will die due to an illegal instruction.
> >
> > Thank you for sharing this, but I am not sure if we should treat
> > vector like AMX on x86. IMHO, compiler might generate code with vector
> > instructions automatically someday, maybe we should treat vector
> > extensions like other extensions.
> > If user knows the vector extension is supported in this CPU and he
> > would like to use it, it seems we should let user use it directly just
> > like other extensions.
> > If user don't know it exists or not, user should use the library API
> > transparently and let glibc or other library deal with it. The glibc
> > ifunc feature or multi-lib should be able to choose the correct
> > implementation.
>
> What makes me think that the vector extension should be treated like AMX is
> that they both (1) have a significant amount of architectural state, and
> (2) likely have a significant power and/or area impact on (non-emulated)
> designs.
>
> For example, I think it is possible, maybe even likely, that vector
> implementations will have one or more of the following behaviors:
>
>   1. A single vector unit shared among two or more harts,
>
>   2. Additional power consumption when the vector unit is enabled and idle
> versus not being enabled at all,
>
>   3. For a system which supports variable operating frequency, a reduction
> in the maximum frequency when the vector unit is enabled, and/or
>
>   4. The inability to enter low power states and/or delays to low power
> states transitions when the vector unit is enabled.
>
> None of the above constraints apply to more ordinary extensions like
> compressed or the various bit manipulation extensions.
>
> The discussion I linked to has some well reasoned arguments on why
> substantial extensions should have a mechanism to request using them by
> user space.  The discussion was in the context of Intel AMX, but applies to
> further x86 extensions, and I think should also apply to similar extensions
> on RISC-V, like vector here.

Have you ever checked the SVE/SVE2 of ARM64 implementation in Linux kernel too?
IMHO, the vector of RISCV should be closer to the SVE2 of ARM64.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-10-01  2:46         ` Ley Foon Tan
@ 2021-10-04 12:41           ` Greentime Hu
  2021-10-05  2:12             ` Ley Foon Tan
  0 siblings, 1 reply; 55+ messages in thread
From: Greentime Hu @ 2021-10-04 12:41 UTC (permalink / raw)
  To: Ley Foon Tan
  Cc: Darius Rad, linux-riscv, Linux Kernel Mailing List, Albert Ou,
	Palmer Dabbelt, Paul Walmsley, Vincent Chen

Ley Foon Tan <lftan.linux@gmail.com> 於 2021年10月1日 週五 上午10:46寫道:
>
> On Wed, Sep 29, 2021 at 11:54 PM Darius Rad <darius@bluespec.com> wrote:
> >
> > On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > > Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> > > >
> > > > On 9/8/21 1:45 PM, Greentime Hu wrote:
> > > > > This patch adds task switch support for vector. It supports partial lazy
> > > > > save and restore mechanism. It also supports all lengths of vlen.
> > > > >
> > > > > [guoren@linux.alibaba.com: First available porting to support vector
> > > > > context switching]
> > > > > [nick.knight@sifive.com: Rewrite vector.S to support dynamic vlen, xlen and
> > > > > code refine]
> > > > > [vincent.chen@sifive.co: Fix the might_sleep issue in vstate_save,
> > > > > vstate_restore]
> > > > > Co-developed-by: Nick Knight <nick.knight@sifive.com>
> > > > > Signed-off-by: Nick Knight <nick.knight@sifive.com>
> > > > > Co-developed-by: Guo Ren <guoren@linux.alibaba.com>
> > > > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> > > > > Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
> > > > > Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> > > > > Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> > > > > ---
> > > > >   arch/riscv/include/asm/switch_to.h | 66 +++++++++++++++++++++++
> > > > >   arch/riscv/kernel/Makefile         |  1 +
> > > > >   arch/riscv/kernel/process.c        | 38 ++++++++++++++
> > > > >   arch/riscv/kernel/vector.S         | 84 ++++++++++++++++++++++++++++++
> > > > >   4 files changed, 189 insertions(+)
> > > > >   create mode 100644 arch/riscv/kernel/vector.S
> > > > >
> > > > > diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
> > > > > index ec83770b3d98..de0573dad78f 100644
> > > > > --- a/arch/riscv/include/asm/switch_to.h
> > > > > +++ b/arch/riscv/include/asm/switch_to.h
> > > > > @@ -7,10 +7,12 @@
> > > > >   #define _ASM_RISCV_SWITCH_TO_H
> > > > >
> > > > >   #include <linux/jump_label.h>
> > > > > +#include <linux/slab.h>
> > > > >   #include <linux/sched/task_stack.h>
> > > > >   #include <asm/processor.h>
> > > > >   #include <asm/ptrace.h>
> > > > >   #include <asm/csr.h>
> > > > > +#include <asm/asm-offsets.h>
> > > > >
> > > > >   #ifdef CONFIG_FPU
> > > > >   extern void __fstate_save(struct task_struct *save_to);
> > > > > @@ -68,6 +70,68 @@ static __always_inline bool has_fpu(void) { return false; }
> > > > >   #define __switch_to_fpu(__prev, __next) do { } while (0)
> > > > >   #endif
> > > > >
> > > > > +#ifdef CONFIG_VECTOR
> > > > > +extern bool has_vector;
> > > > > +extern unsigned long riscv_vsize;
> > > > > +extern void __vstate_save(struct __riscv_v_state *save_to, void *datap);
> > > > > +extern void __vstate_restore(struct __riscv_v_state *restore_from, void *datap);
> > > > > +
> > > > > +static inline void __vstate_clean(struct pt_regs *regs)
> > > > > +{
> > > > > +     regs->status = (regs->status & ~(SR_VS)) | SR_VS_CLEAN;
> > > > > +}
> > > > > +
> > > > > +static inline void vstate_off(struct task_struct *task,
> > > > > +                           struct pt_regs *regs)
> > > > > +{
> > > > > +     regs->status = (regs->status & ~SR_VS) | SR_VS_OFF;
> > > > > +}
> > > > > +
> > > > > +static inline void vstate_save(struct task_struct *task,
> > > > > +                            struct pt_regs *regs)
> > > > > +{
> > > > > +     if ((regs->status & SR_VS) == SR_VS_DIRTY) {
> > > > > +             struct __riscv_v_state *vstate = &(task->thread.vstate);
> > > > > +
> > > > > +             __vstate_save(vstate, vstate->datap);
> > > > > +             __vstate_clean(regs);
> > > > > +     }
> > > > > +}
> > > > > +
> > > > > +static inline void vstate_restore(struct task_struct *task,
> > > > > +                               struct pt_regs *regs)
> > > > > +{
> > > > > +     if ((regs->status & SR_VS) != SR_VS_OFF) {
> > > > > +             struct __riscv_v_state *vstate = &(task->thread.vstate);
> > > > > +
> > > > > +             /* Allocate space for vector registers. */
> > > > > +             if (!vstate->datap) {
> > > > > +                     vstate->datap = kzalloc(riscv_vsize, GFP_ATOMIC);
> > > > > +                     vstate->size = riscv_vsize;
> > > > > +             }
> > > > > +             __vstate_restore(vstate, vstate->datap);
> > > > > +             __vstate_clean(regs);
> > > > > +     }
> > > > > +}
> > > > > +
> > > > > +static inline void __switch_to_vector(struct task_struct *prev,
> > > > > +                                struct task_struct *next)
> > > > > +{
> > > > > +     struct pt_regs *regs;
> > > > > +
> > > > > +     regs = task_pt_regs(prev);
> > > > > +     if (unlikely(regs->status & SR_SD))
> > > > > +             vstate_save(prev, regs);
> > > > > +     vstate_restore(next, task_pt_regs(next));
> > > > > +}
> > > > > +
> > > > > +#else
> > > > > +#define has_vector false
> > > > > +#define vstate_save(task, regs) do { } while (0)
> > > > > +#define vstate_restore(task, regs) do { } while (0)
> > > > > +#define __switch_to_vector(__prev, __next) do { } while (0)
> > > > > +#endif
> > > > > +
> > > > >   extern struct task_struct *__switch_to(struct task_struct *,
> > > > >                                      struct task_struct *);
> > > > >
> > > > > @@ -77,6 +141,8 @@ do {                                                       \
> > > > >       struct task_struct *__next = (next);            \
> > > > >       if (has_fpu())                                  \
> > > > >               __switch_to_fpu(__prev, __next);        \
> > > > > +     if (has_vector)                                 \
> > > > > +             __switch_to_vector(__prev, __next);     \
> > > > >       ((last) = __switch_to(__prev, __next));         \
> > > > >   } while (0)
> > > > >
> > > > > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > > > > index 3397ddac1a30..344078080839 100644
> > > > > --- a/arch/riscv/kernel/Makefile
> > > > > +++ b/arch/riscv/kernel/Makefile
> > > > > @@ -40,6 +40,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
> > > > >
> > > > >   obj-$(CONFIG_RISCV_M_MODE)  += traps_misaligned.o
> > > > >   obj-$(CONFIG_FPU)           += fpu.o
> > > > > +obj-$(CONFIG_VECTOR)         += vector.o
> > > > >   obj-$(CONFIG_SMP)           += smpboot.o
> > > > >   obj-$(CONFIG_SMP)           += smp.o
> > > > >   obj-$(CONFIG_SMP)           += cpu_ops.o
> > > > > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> > > > > index 03ac3aa611f5..0b86e9e531c9 100644
> > > > > --- a/arch/riscv/kernel/process.c
> > > > > +++ b/arch/riscv/kernel/process.c
> > > > > @@ -95,6 +95,16 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
> > > > >                */
> > > > >               fstate_restore(current, regs);
> > > > >       }
> > > > > +
> > > > > +     if (has_vector) {
> > > > > +             regs->status |= SR_VS_INITIAL;
> > > > > +             /*
> > > > > +              * Restore the initial value to the vector register
> > > > > +              * before starting the user program.
> > > > > +              */
> > > > > +             vstate_restore(current, regs);
> > > > > +     }
> > > > > +
> > > >
> > > > So this will unconditionally enable vector instructions, and allocate
> > > > memory for vector state, for all processes, regardless of whether vector
> > > > instructions are used?
> > > >
> > >
> > > Hi Darius,
> > >
> > > Yes, it will enable vector if has_vector() is true. The reason that we
> > > choose to enable and allocate memory for user space program is because
> > > we also implement some common functions in the glibc such as memcpy
> > > vector version and it is called very often by every process. So that
> > > we assume if the user program is running in a CPU with vector ISA
> > > would like to use vector by default. If we disable it by default and
> > > make it trigger the illegal instruction, that might be a burden since
> > > almost every process will use vector glibc memcpy or something like
> > > that.
> >
> > Do you have any evidence to support the assertion that almost every process
> > would use vector operations?  One could easily argue that the converse is
> > true: no existing software uses the vector extension now, so most likely a
> > process will not be using it.
> >
> > >
> > > > Given the size of the vector state and potential power and performance
> > > > implications of enabling the vector engine, it seems like this should
> > > > treated similarly to Intel AMX on x86.  The full discussion of that is
> > > > here:
> > > >
> > > > https://lore.kernel.org/lkml/CALCETrW2QHa2TLvnUuVxAAheqcbSZ-5_WRXtDSAGcbG8N+gtdQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/
> > > >
> > > > The cover letter for recent Intel AMX patches has a summary of the x86
> > > > implementation:
> > > >
> > > > https://lore.kernel.org/lkml/20210825155413.19673-1-chang.seok.bae@intel.com/
> > > >
> > > > If RISC-V were to adopt a similar approach, I think the significant
> > > > points are:
> > > >
> > > >   1. A process (or thread) must specifically request the desire to use
> > > > vector extensions (perhaps with some new arch_prctl() API),
> > > >
> > > >   2. The kernel is free to deny permission, perhaps based on
> > > > administrative rules or for other reasons, and
> > > >
> > > >   3. If a process attempts to use vector extensions before doing the
> > > > above, the process will die due to an illegal instruction.
> > >
> > > Thank you for sharing this, but I am not sure if we should treat
> > > vector like AMX on x86. IMHO, compiler might generate code with vector
> > > instructions automatically someday, maybe we should treat vector
> > > extensions like other extensions.
> > > If user knows the vector extension is supported in this CPU and he
> > > would like to use it, it seems we should let user use it directly just
> > > like other extensions.
> > > If user don't know it exists or not, user should use the library API
> > > transparently and let glibc or other library deal with it. The glibc
> > > ifunc feature or multi-lib should be able to choose the correct
> > > implementation.
> >
> > What makes me think that the vector extension should be treated like AMX is
> > that they both (1) have a significant amount of architectural state, and
> > (2) likely have a significant power and/or area impact on (non-emulated)
> > designs.
> >
> > For example, I think it is possible, maybe even likely, that vector
> > implementations will have one or more of the following behaviors:
> >
> >   1. A single vector unit shared among two or more harts,
> >
> >   2. Additional power consumption when the vector unit is enabled and idle
> > versus not being enabled at all,
> >
> >   3. For a system which supports variable operating frequency, a reduction
> > in the maximum frequency when the vector unit is enabled, and/or
> >
> >   4. The inability to enter low power states and/or delays to low power
> > states transitions when the vector unit is enabled.
> >
> > None of the above constraints apply to more ordinary extensions like
> > compressed or the various bit manipulation extensions.
> >
> > The discussion I linked to has some well reasoned arguments on why
> > substantial extensions should have a mechanism to request using them by
> > user space.  The discussion was in the context of Intel AMX, but applies to
> > further x86 extensions, and I think should also apply to similar extensions
> > on RISC-V, like vector here.
> >
> There is possible use case where not all cores support vector
> extension due to size, area and power.
> Perhaps can have the mechanism or flow to determine the
> application/thread require vector extension or it specifically request
> the desire to use
> vector extensions. Then this app/thread run on cpu with vector
> extension capability only.
>

IIRC, we assume all harts has the same ability in Linux because of SMP
assumption.
If we have more information of hw capability and we may use this
information for scheduler to switch the correct process to the correct
CPU.
Do you have any idea how to implement it in Linux kernel? Maybe we can
list in the TODO list.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 20/21] riscv: Optimize task switch codes of vector
  2021-09-15 14:29   ` Jisheng Zhang
@ 2021-10-04 14:13     ` Greentime Hu
  0 siblings, 0 replies; 55+ messages in thread
From: Greentime Hu @ 2021-10-04 14:13 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: linux-riscv, Linux Kernel Mailing List, Albert Ou,
	Palmer Dabbelt, Paul Walmsley, Vincent Chen

Jisheng Zhang <jszhang3@mail.ustc.edu.cn> 於 2021年9月15日 週三 下午10:36寫道:
>
> On Thu,  9 Sep 2021 01:45:32 +0800
> Greentime Hu <greentime.hu@sifive.com> wrote:
>
> > This patch replacees 2 instructions with 1 instruction to do the same thing
> > . rs1=x0 with rd != x0 is a special form of the instruction that sets vl to
> > MAXVL.
>
> Similarly, the vector.S is newly introduced in this patch set, so could
> this optimization be folded into the __vstate_save and __vstate_restore
> introduction patch? Or it's better to keep this optimizaion in commit log?
>
Yeah, I wanted to keep the optimization commit log before, but it's ok
if you think merge code is easier to read.
I'll merge this patch in the next version.

> >
> > Suggested-by: Andrew Waterman <andrew@sifive.com>
> > Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
> > Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> > Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> > ---
> >  arch/riscv/kernel/vector.S | 9 +++------
> >  1 file changed, 3 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/riscv/kernel/vector.S b/arch/riscv/kernel/vector.S
> > index 4f0c5a166e4e..f7223c81b11a 100644
> > --- a/arch/riscv/kernel/vector.S
> > +++ b/arch/riscv/kernel/vector.S
> > @@ -27,8 +27,7 @@
> >  #define x_vl     t2
> >  #define x_vcsr   t3
> >  #define incr     t4
> > -#define m_one    t5
> > -#define status   t6
> > +#define status   t5
> >
> >  ENTRY(__vstate_save)
> >       li      status, SR_VS
> > @@ -38,8 +37,7 @@ ENTRY(__vstate_save)
> >       csrr    x_vtype, CSR_VTYPE
> >       csrr    x_vl, CSR_VL
> >       csrr    x_vcsr, CSR_VCSR
> > -     li      m_one, -1
> > -     vsetvli incr, m_one, e8, m8
> > +     vsetvli incr, x0, e8, m8
> >       vse8.v   v0, (datap)
> >       add     datap, datap, incr
> >       vse8.v   v8, (datap)
> > @@ -61,8 +59,7 @@ ENTRY(__vstate_restore)
> >       li      status, SR_VS
> >       csrs    CSR_STATUS, status
> >
> > -     li      m_one, -1
> > -     vsetvli incr, m_one, e8, m8
> > +     vsetvli incr, x0, e8, m8
> >       vle8.v   v0, (datap)
> >       add     datap, datap, incr
> >       vle8.v   v8, (datap)
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 21/21] riscv: Turn has_vector into a static key if VECTOR=y
  2021-09-15 14:24   ` Jisheng Zhang
@ 2021-10-04 15:04     ` Greentime Hu
  0 siblings, 0 replies; 55+ messages in thread
From: Greentime Hu @ 2021-10-04 15:04 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: linux-riscv, Linux Kernel Mailing List, Albert Ou,
	Palmer Dabbelt, Paul Walmsley, Vincent Chen

Jisheng Zhang <jszhang3@mail.ustc.edu.cn> 於 2021年9月15日 週三 下午10:31寫道:
>
> On Thu,  9 Sep 2021 01:45:33 +0800
> Greentime Hu <greentime.hu@sifive.com> wrote:
>
> > Just like fpu, we can use static key for has_vector.
> > The has_vector check sits at hot code path: switch_to(). Currently,
> > has_vector is a bool variable if VECTOR=y, switch_to() checks it each time,
> > we can optimize out this check by turning the has_vector into a static key.
> >
>
> has_vector is newly introduced in this patch set so I believe this patch can
> be folded into has_vector introducing patch, I.E patch 6
>
ok, I'll split this patch and merge these codes into its related patches.

>
> > Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> > ---
> >  arch/riscv/include/asm/switch_to.h     | 10 +++++++---
> >  arch/riscv/kernel/cpufeature.c         |  4 ++--
> >  arch/riscv/kernel/kernel_mode_vector.c |  4 ++--
> >  arch/riscv/kernel/process.c            |  8 ++++----
> >  arch/riscv/kernel/signal.c             |  6 +++---
> >  5 files changed, 18 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
> > index b48c9c974564..576204217e0f 100644
> > --- a/arch/riscv/include/asm/switch_to.h
> > +++ b/arch/riscv/include/asm/switch_to.h
> > @@ -71,7 +71,11 @@ static __always_inline bool has_fpu(void) { return false; }
> >  #endif
> >
> >  #ifdef CONFIG_VECTOR
> > -extern bool has_vector;
> > +extern struct static_key_false cpu_hwcap_vector;
> > +static __always_inline bool has_vector(void)
> > +{
> > +     return static_branch_likely(&cpu_hwcap_vector);
> > +}
> >  extern unsigned long riscv_vsize;
> >  extern void __vstate_save(struct __riscv_v_state *save_to, void *datap);
> >  extern void __vstate_restore(struct __riscv_v_state *restore_from, void *datap);
> > @@ -120,7 +124,7 @@ static inline void __switch_to_vector(struct task_struct *prev,
> >  }
> >
> >  #else
> > -#define has_vector false
> > +static __always_inline bool has_vector(void) { return false; }
> >  #define riscv_vsize (0)
> >  #define vstate_save(task, regs) do { } while (0)
> >  #define vstate_restore(task, regs) do { } while (0)
> > @@ -136,7 +140,7 @@ do {                                                      \
> >       struct task_struct *__next = (next);            \
> >       if (has_fpu())                                  \
> >               __switch_to_fpu(__prev, __next);        \
> > -     if (has_vector)                                 \
> > +     if (has_vector())                                       \
> >               __switch_to_vector(__prev, __next);     \
> >       ((last) = __switch_to(__prev, __next));         \
> >  } while (0)
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index af984f875f60..0139ec20adce 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -23,7 +23,7 @@ __ro_after_init DEFINE_STATIC_KEY_FALSE(cpu_hwcap_fpu);
> >  #endif
> >  #ifdef CONFIG_VECTOR
> >  #include <asm/vector.h>
> > -bool has_vector __read_mostly;
> > +__ro_after_init DEFINE_STATIC_KEY_FALSE(cpu_hwcap_vector);
> >  unsigned long riscv_vsize __read_mostly;
> >  #endif
> >
> > @@ -157,7 +157,7 @@ void __init riscv_fill_hwcap(void)
> >
> >  #ifdef CONFIG_VECTOR
> >       if (elf_hwcap & COMPAT_HWCAP_ISA_V) {
> > -             has_vector = true;
> > +             static_branch_enable(&cpu_hwcap_vector);
> >               /* There are 32 vector registers with vlenb length. */
> >               rvv_enable();
> >               riscv_vsize = csr_read(CSR_VLENB) * 32;
> > diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
> > index 0d990bd8b8dd..0d08954c30af 100644
> > --- a/arch/riscv/kernel/kernel_mode_vector.c
> > +++ b/arch/riscv/kernel/kernel_mode_vector.c
> > @@ -110,7 +110,7 @@ static void vector_flush_cpu_state(void)
> >   */
> >  void kernel_rvv_begin(void)
> >  {
> > -     if (WARN_ON(!has_vector))
> > +     if (WARN_ON(!has_vector()))
> >               return;
> >
> >       WARN_ON(!may_use_vector());
> > @@ -140,7 +140,7 @@ EXPORT_SYMBOL(kernel_rvv_begin);
> >   */
> >  void kernel_rvv_end(void)
> >  {
> > -     if (WARN_ON(!has_vector))
> > +     if (WARN_ON(!has_vector()))
> >               return;
> >
> >       /* Invalidate vector regs */
> > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> > index 05ff5f934e7e..62540815ba1c 100644
> > --- a/arch/riscv/kernel/process.c
> > +++ b/arch/riscv/kernel/process.c
> > @@ -96,7 +96,7 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
> >               fstate_restore(current, regs);
> >       }
> >
> > -     if (has_vector) {
> > +     if (has_vector()) {
> >               struct __riscv_v_state *vstate = &(current->thread.vstate);
> >
> >               /* Enable vector and allocate memory for vector registers. */
> > @@ -141,11 +141,11 @@ void flush_thread(void)
> >  int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
> >  {
> >       fstate_save(src, task_pt_regs(src));
> > -     if (has_vector)
> > +     if (has_vector())
> >               /* To make sure every dirty vector context is saved. */
> >               vstate_save(src, task_pt_regs(src));
> >       *dst = *src;
> > -     if (has_vector) {
> > +     if (has_vector()) {
> >               /* Copy vector context to the forked task from parent. */
> >               if ((task_pt_regs(src)->status & SR_VS) != SR_VS_OFF) {
> >                       dst->thread.vstate.datap = kzalloc(riscv_vsize, GFP_KERNEL);
> > @@ -164,7 +164,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
> >  void arch_release_task_struct(struct task_struct *tsk)
> >  {
> >       /* Free the vector context of datap. */
> > -     if (has_vector)
> > +     if (has_vector())
> >               kfree(tsk->thread.vstate.datap);
> >  }
> >
> > diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
> > index d30a3b588156..6a19b4b7b206 100644
> > --- a/arch/riscv/kernel/signal.c
> > +++ b/arch/riscv/kernel/signal.c
> > @@ -192,7 +192,7 @@ static long restore_sigcontext(struct pt_regs *regs,
> >                               goto invalid;
> >                       goto done;
> >               case RVV_MAGIC:
> > -                     if (!has_vector)
> > +                     if (!has_vector())
> >                               goto invalid;
> >                       if (size != rvv_sc_size)
> >                               goto invalid;
> > @@ -221,7 +221,7 @@ static size_t cal_rt_frame_size(void)
> >
> >       frame_size = sizeof(*frame);
> >
> > -     if (has_vector)
> > +     if (has_vector())
> >               total_context_size += rvv_sc_size;
> >       /* Preserved a __riscv_ctx_hdr for END signal context header. */
> >       total_context_size += sizeof(struct __riscv_ctx_hdr);
> > @@ -288,7 +288,7 @@ static long setup_sigcontext(struct rt_sigframe __user *frame,
> >       if (has_fpu())
> >               err |= save_fp_state(regs, &sc->sc_fpregs);
> >       /* Save the vector state. */
> > -     if (has_vector)
> > +     if (has_vector())
> >               err |= save_v_state(regs, &sc_reserved_free_ptr);
> >
> >       /* Put END __riscv_ctx_hdr at the end. */
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-10-04 12:41           ` Greentime Hu
@ 2021-10-05  2:12             ` Ley Foon Tan
  2021-10-05 15:46               ` Greentime Hu
  0 siblings, 1 reply; 55+ messages in thread
From: Ley Foon Tan @ 2021-10-05  2:12 UTC (permalink / raw)
  To: Greentime Hu
  Cc: Darius Rad, linux-riscv, Linux Kernel Mailing List, Albert Ou,
	Palmer Dabbelt, Paul Walmsley, Vincent Chen

On Mon, Oct 4, 2021 at 8:41 PM Greentime Hu <greentime.hu@sifive.com> wrote:
>
> Ley Foon Tan <lftan.linux@gmail.com> 於 2021年10月1日 週五 上午10:46寫道:
> >
> > On Wed, Sep 29, 2021 at 11:54 PM Darius Rad <darius@bluespec.com> wrote:
> > >
> > > On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > > > Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> > > > >
[....]


> > > > >
> > > > > So this will unconditionally enable vector instructions, and allocate
> > > > > memory for vector state, for all processes, regardless of whether vector
> > > > > instructions are used?
> > > > >
> > > >
> > > > Hi Darius,
> > > >
> > > > Yes, it will enable vector if has_vector() is true. The reason that we
> > > > choose to enable and allocate memory for user space program is because
> > > > we also implement some common functions in the glibc such as memcpy
> > > > vector version and it is called very often by every process. So that
> > > > we assume if the user program is running in a CPU with vector ISA
> > > > would like to use vector by default. If we disable it by default and
> > > > make it trigger the illegal instruction, that might be a burden since
> > > > almost every process will use vector glibc memcpy or something like
> > > > that.
> > >
> > > Do you have any evidence to support the assertion that almost every process
> > > would use vector operations?  One could easily argue that the converse is
> > > true: no existing software uses the vector extension now, so most likely a
> > > process will not be using it.
> > >
> > > >
> > > > > Given the size of the vector state and potential power and performance
> > > > > implications of enabling the vector engine, it seems like this should
> > > > > treated similarly to Intel AMX on x86.  The full discussion of that is
> > > > > here:
> > > > >
> > > > > https://lore.kernel.org/lkml/CALCETrW2QHa2TLvnUuVxAAheqcbSZ-5_WRXtDSAGcbG8N+gtdQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/
> > > > >
> > > > > The cover letter for recent Intel AMX patches has a summary of the x86
> > > > > implementation:
> > > > >
> > > > > https://lore.kernel.org/lkml/20210825155413.19673-1-chang.seok.bae@intel.com/
> > > > >
> > > > > If RISC-V were to adopt a similar approach, I think the significant
> > > > > points are:
> > > > >
> > > > >   1. A process (or thread) must specifically request the desire to use
> > > > > vector extensions (perhaps with some new arch_prctl() API),
> > > > >
> > > > >   2. The kernel is free to deny permission, perhaps based on
> > > > > administrative rules or for other reasons, and
> > > > >
> > > > >   3. If a process attempts to use vector extensions before doing the
> > > > > above, the process will die due to an illegal instruction.
> > > >
> > > > Thank you for sharing this, but I am not sure if we should treat
> > > > vector like AMX on x86. IMHO, compiler might generate code with vector
> > > > instructions automatically someday, maybe we should treat vector
> > > > extensions like other extensions.
> > > > If user knows the vector extension is supported in this CPU and he
> > > > would like to use it, it seems we should let user use it directly just
> > > > like other extensions.
> > > > If user don't know it exists or not, user should use the library API
> > > > transparently and let glibc or other library deal with it. The glibc
> > > > ifunc feature or multi-lib should be able to choose the correct
> > > > implementation.
> > >
> > > What makes me think that the vector extension should be treated like AMX is
> > > that they both (1) have a significant amount of architectural state, and
> > > (2) likely have a significant power and/or area impact on (non-emulated)
> > > designs.
> > >
> > > For example, I think it is possible, maybe even likely, that vector
> > > implementations will have one or more of the following behaviors:
> > >
> > >   1. A single vector unit shared among two or more harts,
> > >
> > >   2. Additional power consumption when the vector unit is enabled and idle
> > > versus not being enabled at all,
> > >
> > >   3. For a system which supports variable operating frequency, a reduction
> > > in the maximum frequency when the vector unit is enabled, and/or
> > >
> > >   4. The inability to enter low power states and/or delays to low power
> > > states transitions when the vector unit is enabled.
> > >
> > > None of the above constraints apply to more ordinary extensions like
> > > compressed or the various bit manipulation extensions.
> > >
> > > The discussion I linked to has some well reasoned arguments on why
> > > substantial extensions should have a mechanism to request using them by
> > > user space.  The discussion was in the context of Intel AMX, but applies to
> > > further x86 extensions, and I think should also apply to similar extensions
> > > on RISC-V, like vector here.
> > >
> > There is possible use case where not all cores support vector
> > extension due to size, area and power.
> > Perhaps can have the mechanism or flow to determine the
> > application/thread require vector extension or it specifically request
> > the desire to use
> > vector extensions. Then this app/thread run on cpu with vector
> > extension capability only.
> >
>
> IIRC, we assume all harts has the same ability in Linux because of SMP
> assumption.
> If we have more information of hw capability and we may use this
> information for scheduler to switch the correct process to the correct
> CPU.
> Do you have any idea how to implement it in Linux kernel? Maybe we can
> list in the TODO list.
I think we can refer to other arch implementations as reference:

1. ARM64 supports 32-bit thread on asymmetric AArch32 systems. There
is a flag in ELF to check, then start the thread on the core that
supports 32-bit execution. This patchset is merged to mainline 5.15.
https://lore.kernel.org/linux-arm-kernel/20210730112443.23245-8-will@kernel.org/T/

2. Link shared by Darius, on-demand request implementation on Intel AMX
https://lore.kernel.org/lkml/20210825155413.19673-1-chang.seok.bae@intel.com/

glibc support optimized library functions with vector, this is enabled
by default if compiler is with vector extension enabled? If yes, then
most of the app required vector core.

Regards
Ley Foon

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-10-04 12:36         ` Greentime Hu
@ 2021-10-05 13:57           ` Darius Rad
  2021-10-21  1:01             ` Paul Walmsley
  0 siblings, 1 reply; 55+ messages in thread
From: Darius Rad @ 2021-10-05 13:57 UTC (permalink / raw)
  To: Greentime Hu
  Cc: linux-riscv, Linux Kernel Mailing List, Albert Ou,
	Palmer Dabbelt, Paul Walmsley, Vincent Chen

On Mon, Oct 04, 2021 at 08:36:30PM +0800, Greentime Hu wrote:
> Darius Rad <darius@bluespec.com> 於 2021年9月29日 週三 下午9:28寫道:
> >
> > On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > > Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> > > >
> > > > On 9/8/21 1:45 PM, Greentime Hu wrote:
> > > > > This patch adds task switch support for vector. It supports partial lazy
> > > > > save and restore mechanism. It also supports all lengths of vlen.
> > > > >
> > > > > [guoren@linux.alibaba.com: First available porting to support vector
> > > > > context switching]
> > > > > [nick.knight@sifive.com: Rewrite vector.S to support dynamic vlen, xlen and
> > > > > code refine]
> > > > > [vincent.chen@sifive.co: Fix the might_sleep issue in vstate_save,
> > > > > vstate_restore]
> > > > > Co-developed-by: Nick Knight <nick.knight@sifive.com>
> > > > > Signed-off-by: Nick Knight <nick.knight@sifive.com>
> > > > > Co-developed-by: Guo Ren <guoren@linux.alibaba.com>
> > > > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> > > > > Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
> > > > > Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> > > > > Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
> > > > > ---
> > > > >   arch/riscv/include/asm/switch_to.h | 66 +++++++++++++++++++++++
> > > > >   arch/riscv/kernel/Makefile         |  1 +
> > > > >   arch/riscv/kernel/process.c        | 38 ++++++++++++++
> > > > >   arch/riscv/kernel/vector.S         | 84 ++++++++++++++++++++++++++++++
> > > > >   4 files changed, 189 insertions(+)
> > > > >   create mode 100644 arch/riscv/kernel/vector.S
> > > > >
> > > > > diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
> > > > > index ec83770b3d98..de0573dad78f 100644
> > > > > --- a/arch/riscv/include/asm/switch_to.h
> > > > > +++ b/arch/riscv/include/asm/switch_to.h
> > > > > @@ -7,10 +7,12 @@
> > > > >   #define _ASM_RISCV_SWITCH_TO_H
> > > > >
> > > > >   #include <linux/jump_label.h>
> > > > > +#include <linux/slab.h>
> > > > >   #include <linux/sched/task_stack.h>
> > > > >   #include <asm/processor.h>
> > > > >   #include <asm/ptrace.h>
> > > > >   #include <asm/csr.h>
> > > > > +#include <asm/asm-offsets.h>
> > > > >
> > > > >   #ifdef CONFIG_FPU
> > > > >   extern void __fstate_save(struct task_struct *save_to);
> > > > > @@ -68,6 +70,68 @@ static __always_inline bool has_fpu(void) { return false; }
> > > > >   #define __switch_to_fpu(__prev, __next) do { } while (0)
> > > > >   #endif
> > > > >
> > > > > +#ifdef CONFIG_VECTOR
> > > > > +extern bool has_vector;
> > > > > +extern unsigned long riscv_vsize;
> > > > > +extern void __vstate_save(struct __riscv_v_state *save_to, void *datap);
> > > > > +extern void __vstate_restore(struct __riscv_v_state *restore_from, void *datap);
> > > > > +
> > > > > +static inline void __vstate_clean(struct pt_regs *regs)
> > > > > +{
> > > > > +     regs->status = (regs->status & ~(SR_VS)) | SR_VS_CLEAN;
> > > > > +}
> > > > > +
> > > > > +static inline void vstate_off(struct task_struct *task,
> > > > > +                           struct pt_regs *regs)
> > > > > +{
> > > > > +     regs->status = (regs->status & ~SR_VS) | SR_VS_OFF;
> > > > > +}
> > > > > +
> > > > > +static inline void vstate_save(struct task_struct *task,
> > > > > +                            struct pt_regs *regs)
> > > > > +{
> > > > > +     if ((regs->status & SR_VS) == SR_VS_DIRTY) {
> > > > > +             struct __riscv_v_state *vstate = &(task->thread.vstate);
> > > > > +
> > > > > +             __vstate_save(vstate, vstate->datap);
> > > > > +             __vstate_clean(regs);
> > > > > +     }
> > > > > +}
> > > > > +
> > > > > +static inline void vstate_restore(struct task_struct *task,
> > > > > +                               struct pt_regs *regs)
> > > > > +{
> > > > > +     if ((regs->status & SR_VS) != SR_VS_OFF) {
> > > > > +             struct __riscv_v_state *vstate = &(task->thread.vstate);
> > > > > +
> > > > > +             /* Allocate space for vector registers. */
> > > > > +             if (!vstate->datap) {
> > > > > +                     vstate->datap = kzalloc(riscv_vsize, GFP_ATOMIC);
> > > > > +                     vstate->size = riscv_vsize;
> > > > > +             }
> > > > > +             __vstate_restore(vstate, vstate->datap);
> > > > > +             __vstate_clean(regs);
> > > > > +     }
> > > > > +}
> > > > > +
> > > > > +static inline void __switch_to_vector(struct task_struct *prev,
> > > > > +                                struct task_struct *next)
> > > > > +{
> > > > > +     struct pt_regs *regs;
> > > > > +
> > > > > +     regs = task_pt_regs(prev);
> > > > > +     if (unlikely(regs->status & SR_SD))
> > > > > +             vstate_save(prev, regs);
> > > > > +     vstate_restore(next, task_pt_regs(next));
> > > > > +}
> > > > > +
> > > > > +#else
> > > > > +#define has_vector false
> > > > > +#define vstate_save(task, regs) do { } while (0)
> > > > > +#define vstate_restore(task, regs) do { } while (0)
> > > > > +#define __switch_to_vector(__prev, __next) do { } while (0)
> > > > > +#endif
> > > > > +
> > > > >   extern struct task_struct *__switch_to(struct task_struct *,
> > > > >                                      struct task_struct *);
> > > > >
> > > > > @@ -77,6 +141,8 @@ do {                                                       \
> > > > >       struct task_struct *__next = (next);            \
> > > > >       if (has_fpu())                                  \
> > > > >               __switch_to_fpu(__prev, __next);        \
> > > > > +     if (has_vector)                                 \
> > > > > +             __switch_to_vector(__prev, __next);     \
> > > > >       ((last) = __switch_to(__prev, __next));         \
> > > > >   } while (0)
> > > > >
> > > > > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > > > > index 3397ddac1a30..344078080839 100644
> > > > > --- a/arch/riscv/kernel/Makefile
> > > > > +++ b/arch/riscv/kernel/Makefile
> > > > > @@ -40,6 +40,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
> > > > >
> > > > >   obj-$(CONFIG_RISCV_M_MODE)  += traps_misaligned.o
> > > > >   obj-$(CONFIG_FPU)           += fpu.o
> > > > > +obj-$(CONFIG_VECTOR)         += vector.o
> > > > >   obj-$(CONFIG_SMP)           += smpboot.o
> > > > >   obj-$(CONFIG_SMP)           += smp.o
> > > > >   obj-$(CONFIG_SMP)           += cpu_ops.o
> > > > > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> > > > > index 03ac3aa611f5..0b86e9e531c9 100644
> > > > > --- a/arch/riscv/kernel/process.c
> > > > > +++ b/arch/riscv/kernel/process.c
> > > > > @@ -95,6 +95,16 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
> > > > >                */
> > > > >               fstate_restore(current, regs);
> > > > >       }
> > > > > +
> > > > > +     if (has_vector) {
> > > > > +             regs->status |= SR_VS_INITIAL;
> > > > > +             /*
> > > > > +              * Restore the initial value to the vector register
> > > > > +              * before starting the user program.
> > > > > +              */
> > > > > +             vstate_restore(current, regs);
> > > > > +     }
> > > > > +
> > > >
> > > > So this will unconditionally enable vector instructions, and allocate
> > > > memory for vector state, for all processes, regardless of whether vector
> > > > instructions are used?
> > > >
> > >
> > > Hi Darius,
> > >
> > > Yes, it will enable vector if has_vector() is true. The reason that we
> > > choose to enable and allocate memory for user space program is because
> > > we also implement some common functions in the glibc such as memcpy
> > > vector version and it is called very often by every process. So that
> > > we assume if the user program is running in a CPU with vector ISA
> > > would like to use vector by default. If we disable it by default and
> > > make it trigger the illegal instruction, that might be a burden since
> > > almost every process will use vector glibc memcpy or something like
> > > that.
> >
> > Do you have any evidence to support the assertion that almost every process
> > would use vector operations?  One could easily argue that the converse is
> > true: no existing software uses the vector extension now, so most likely a
> > process will not be using it.
> 
> Glibc ustreaming is just starting so you didn't see software using the
> vector extension now and this patchset is testing based on those
> optimized glibc too.
> Vincent Chen is working on the glibc vector support upstreaming and we
> will also upstream the vector version glibc memcpy, memcmp, memchr,
> memmove, memset, strcmp, strlen.
> Then we will see platform with vector support can use vector version
> mem* and str* functions automatically based on ifunc and platform
> without vector will use the original one automatically. These could be
> done to select the correct optimized glibc functions by ifunc
> mechanism.
> 
> >
> > >
> > > > Given the size of the vector state and potential power and performance
> > > > implications of enabling the vector engine, it seems like this should
> > > > treated similarly to Intel AMX on x86.  The full discussion of that is
> > > > here:
> > > >
> > > > https://lore.kernel.org/lkml/CALCETrW2QHa2TLvnUuVxAAheqcbSZ-5_WRXtDSAGcbG8N+gtdQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/
> > > >
> > > > The cover letter for recent Intel AMX patches has a summary of the x86
> > > > implementation:
> > > >
> > > > https://lore.kernel.org/lkml/20210825155413.19673-1-chang.seok.bae@intel.com/
> > > >
> > > > If RISC-V were to adopt a similar approach, I think the significant
> > > > points are:
> > > >
> > > >   1. A process (or thread) must specifically request the desire to use
> > > > vector extensions (perhaps with some new arch_prctl() API),
> > > >
> > > >   2. The kernel is free to deny permission, perhaps based on
> > > > administrative rules or for other reasons, and
> > > >
> > > >   3. If a process attempts to use vector extensions before doing the
> > > > above, the process will die due to an illegal instruction.
> > >
> > > Thank you for sharing this, but I am not sure if we should treat
> > > vector like AMX on x86. IMHO, compiler might generate code with vector
> > > instructions automatically someday, maybe we should treat vector
> > > extensions like other extensions.
> > > If user knows the vector extension is supported in this CPU and he
> > > would like to use it, it seems we should let user use it directly just
> > > like other extensions.
> > > If user don't know it exists or not, user should use the library API
> > > transparently and let glibc or other library deal with it. The glibc
> > > ifunc feature or multi-lib should be able to choose the correct
> > > implementation.
> >
> > What makes me think that the vector extension should be treated like AMX is
> > that they both (1) have a significant amount of architectural state, and
> > (2) likely have a significant power and/or area impact on (non-emulated)
> > designs.
> >
> > For example, I think it is possible, maybe even likely, that vector
> > implementations will have one or more of the following behaviors:
> >
> >   1. A single vector unit shared among two or more harts,
> >
> >   2. Additional power consumption when the vector unit is enabled and idle
> > versus not being enabled at all,
> >
> >   3. For a system which supports variable operating frequency, a reduction
> > in the maximum frequency when the vector unit is enabled, and/or
> >
> >   4. The inability to enter low power states and/or delays to low power
> > states transitions when the vector unit is enabled.
> >
> > None of the above constraints apply to more ordinary extensions like
> > compressed or the various bit manipulation extensions.
> >
> > The discussion I linked to has some well reasoned arguments on why
> > substantial extensions should have a mechanism to request using them by
> > user space.  The discussion was in the context of Intel AMX, but applies to
> > further x86 extensions, and I think should also apply to similar extensions
> > on RISC-V, like vector here.
> 
> Have you ever checked the SVE/SVE2 of ARM64 implementation in Linux kernel too?
> IMHO, the vector of RISCV should be closer to the SVE2 of ARM64.

For SVE on arm64, memory is only allocated and the extension is only
enabled when a process is actively using it, which is not what this patch
set does.  If the memory allocation for state memory fails, it triggers a
BUG(); there is no graceful way to report this to the application.

To do something similar for RISC-V, you will need to write an illegal
instruction handler to retrieve the faulting instruction and partially
decode it enough to determine it is a vector instruction.  That seems
needlessly complicated, doesn't provide a way to gracefully report an
error if memory allocation fails, and doesn't provide any of the other
benefits that a defined API to request use of the vector extension would
provide.

Did you read the discussion on Intel AMX support that I previously linked
to?  There are well reasoned arguments why it is beneficial to require that
a process request access to substantial extensions, like RISC-V vector.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-10-05  2:12             ` Ley Foon Tan
@ 2021-10-05 15:46               ` Greentime Hu
  2021-10-07 10:10                 ` Ley Foon Tan
  0 siblings, 1 reply; 55+ messages in thread
From: Greentime Hu @ 2021-10-05 15:46 UTC (permalink / raw)
  To: Ley Foon Tan
  Cc: Darius Rad, linux-riscv, Linux Kernel Mailing List, Albert Ou,
	Palmer Dabbelt, Paul Walmsley, Vincent Chen

Ley Foon Tan <lftan.linux@gmail.com> 於 2021年10月5日 週二 上午10:12寫道:
>
> On Mon, Oct 4, 2021 at 8:41 PM Greentime Hu <greentime.hu@sifive.com> wrote:
> >
> > Ley Foon Tan <lftan.linux@gmail.com> 於 2021年10月1日 週五 上午10:46寫道:
> > >
> > > On Wed, Sep 29, 2021 at 11:54 PM Darius Rad <darius@bluespec.com> wrote:
> > > >
> > > > On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > > > > Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> > > > > >
> [....]
>
>
> > > > > >
> > > > > > So this will unconditionally enable vector instructions, and allocate
> > > > > > memory for vector state, for all processes, regardless of whether vector
> > > > > > instructions are used?
> > > > > >
> > > > >
> > > > > Hi Darius,
> > > > >
> > > > > Yes, it will enable vector if has_vector() is true. The reason that we
> > > > > choose to enable and allocate memory for user space program is because
> > > > > we also implement some common functions in the glibc such as memcpy
> > > > > vector version and it is called very often by every process. So that
> > > > > we assume if the user program is running in a CPU with vector ISA
> > > > > would like to use vector by default. If we disable it by default and
> > > > > make it trigger the illegal instruction, that might be a burden since
> > > > > almost every process will use vector glibc memcpy or something like
> > > > > that.
> > > >
> > > > Do you have any evidence to support the assertion that almost every process
> > > > would use vector operations?  One could easily argue that the converse is
> > > > true: no existing software uses the vector extension now, so most likely a
> > > > process will not be using it.
> > > >
> > > > >
> > > > > > Given the size of the vector state and potential power and performance
> > > > > > implications of enabling the vector engine, it seems like this should
> > > > > > treated similarly to Intel AMX on x86.  The full discussion of that is
> > > > > > here:
> > > > > >
> > > > > > https://lore.kernel.org/lkml/CALCETrW2QHa2TLvnUuVxAAheqcbSZ-5_WRXtDSAGcbG8N+gtdQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/
> > > > > >
> > > > > > The cover letter for recent Intel AMX patches has a summary of the x86
> > > > > > implementation:
> > > > > >
> > > > > > https://lore.kernel.org/lkml/20210825155413.19673-1-chang.seok.bae@intel.com/
> > > > > >
> > > > > > If RISC-V were to adopt a similar approach, I think the significant
> > > > > > points are:
> > > > > >
> > > > > >   1. A process (or thread) must specifically request the desire to use
> > > > > > vector extensions (perhaps with some new arch_prctl() API),
> > > > > >
> > > > > >   2. The kernel is free to deny permission, perhaps based on
> > > > > > administrative rules or for other reasons, and
> > > > > >
> > > > > >   3. If a process attempts to use vector extensions before doing the
> > > > > > above, the process will die due to an illegal instruction.
> > > > >
> > > > > Thank you for sharing this, but I am not sure if we should treat
> > > > > vector like AMX on x86. IMHO, compiler might generate code with vector
> > > > > instructions automatically someday, maybe we should treat vector
> > > > > extensions like other extensions.
> > > > > If user knows the vector extension is supported in this CPU and he
> > > > > would like to use it, it seems we should let user use it directly just
> > > > > like other extensions.
> > > > > If user don't know it exists or not, user should use the library API
> > > > > transparently and let glibc or other library deal with it. The glibc
> > > > > ifunc feature or multi-lib should be able to choose the correct
> > > > > implementation.
> > > >
> > > > What makes me think that the vector extension should be treated like AMX is
> > > > that they both (1) have a significant amount of architectural state, and
> > > > (2) likely have a significant power and/or area impact on (non-emulated)
> > > > designs.
> > > >
> > > > For example, I think it is possible, maybe even likely, that vector
> > > > implementations will have one or more of the following behaviors:
> > > >
> > > >   1. A single vector unit shared among two or more harts,
> > > >
> > > >   2. Additional power consumption when the vector unit is enabled and idle
> > > > versus not being enabled at all,
> > > >
> > > >   3. For a system which supports variable operating frequency, a reduction
> > > > in the maximum frequency when the vector unit is enabled, and/or
> > > >
> > > >   4. The inability to enter low power states and/or delays to low power
> > > > states transitions when the vector unit is enabled.
> > > >
> > > > None of the above constraints apply to more ordinary extensions like
> > > > compressed or the various bit manipulation extensions.
> > > >
> > > > The discussion I linked to has some well reasoned arguments on why
> > > > substantial extensions should have a mechanism to request using them by
> > > > user space.  The discussion was in the context of Intel AMX, but applies to
> > > > further x86 extensions, and I think should also apply to similar extensions
> > > > on RISC-V, like vector here.
> > > >
> > > There is possible use case where not all cores support vector
> > > extension due to size, area and power.
> > > Perhaps can have the mechanism or flow to determine the
> > > application/thread require vector extension or it specifically request
> > > the desire to use
> > > vector extensions. Then this app/thread run on cpu with vector
> > > extension capability only.
> > >
> >
> > IIRC, we assume all harts has the same ability in Linux because of SMP
> > assumption.
> > If we have more information of hw capability and we may use this
> > information for scheduler to switch the correct process to the correct
> > CPU.
> > Do you have any idea how to implement it in Linux kernel? Maybe we can
> > list in the TODO list.
> I think we can refer to other arch implementations as reference:
>
> 1. ARM64 supports 32-bit thread on asymmetric AArch32 systems. There
> is a flag in ELF to check, then start the thread on the core that
> supports 32-bit execution. This patchset is merged to mainline 5.15.
> https://lore.kernel.org/linux-arm-kernel/20210730112443.23245-8-will@kernel.org/T/

Wow! This is useful for AMP.

>
> 2. Link shared by Darius, on-demand request implementation on Intel AMX
> https://lore.kernel.org/lkml/20210825155413.19673-1-chang.seok.bae@intel.com/
>
> glibc support optimized library functions with vector, this is enabled
> by default if compiler is with vector extension enabled? If yes, then
> most of the app required vector core.

As I mentioned earlier, glibc ifunc will solve this issue. The
Linux/glibc can run on platform with vector or without vector and
glibc will use the information get from Linux kernel and using ifunc
to decide whether it should use the vector version or not.
Which means even your toolchain has vector glibc support and your
Linux kernel told the glibc this platform doesn't support vector then
the ifunc mechanism will choose the non-vector version ones.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-10-05 15:46               ` Greentime Hu
@ 2021-10-07 10:10                 ` Ley Foon Tan
  0 siblings, 0 replies; 55+ messages in thread
From: Ley Foon Tan @ 2021-10-07 10:10 UTC (permalink / raw)
  To: Greentime Hu
  Cc: Darius Rad, linux-riscv, Linux Kernel Mailing List, Albert Ou,
	Palmer Dabbelt, Paul Walmsley, Vincent Chen

On Tue, Oct 5, 2021 at 11:47 PM Greentime Hu <greentime.hu@sifive.com> wrote:
>
> Ley Foon Tan <lftan.linux@gmail.com> 於 2021年10月5日 週二 上午10:12寫道:
> >
> > On Mon, Oct 4, 2021 at 8:41 PM Greentime Hu <greentime.hu@sifive.com> wrote:
> > >
> > > Ley Foon Tan <lftan.linux@gmail.com> 於 2021年10月1日 週五 上午10:46寫道:
> > > >
> > > > On Wed, Sep 29, 2021 at 11:54 PM Darius Rad <darius@bluespec.com> wrote:
> > > > >
> > > > > On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > > > > > Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> > > > > > >
> > [....]
> >
> >
> > > > > > >
> > > > > > > So this will unconditionally enable vector instructions, and allocate
> > > > > > > memory for vector state, for all processes, regardless of whether vector
> > > > > > > instructions are used?
> > > > > > >
> > > > > >
> > > > > > Hi Darius,
> > > > > >
> > > > > > Yes, it will enable vector if has_vector() is true. The reason that we
> > > > > > choose to enable and allocate memory for user space program is because
> > > > > > we also implement some common functions in the glibc such as memcpy
> > > > > > vector version and it is called very often by every process. So that
> > > > > > we assume if the user program is running in a CPU with vector ISA
> > > > > > would like to use vector by default. If we disable it by default and
> > > > > > make it trigger the illegal instruction, that might be a burden since
> > > > > > almost every process will use vector glibc memcpy or something like
> > > > > > that.
> > > > >
> > > > > Do you have any evidence to support the assertion that almost every process
> > > > > would use vector operations?  One could easily argue that the converse is
> > > > > true: no existing software uses the vector extension now, so most likely a
> > > > > process will not be using it.
> > > > >
> > > > > >
> > > > > > > Given the size of the vector state and potential power and performance
> > > > > > > implications of enabling the vector engine, it seems like this should
> > > > > > > treated similarly to Intel AMX on x86.  The full discussion of that is
> > > > > > > here:
> > > > > > >
> > > > > > > https://lore.kernel.org/lkml/CALCETrW2QHa2TLvnUuVxAAheqcbSZ-5_WRXtDSAGcbG8N+gtdQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/
> > > > > > >
> > > > > > > The cover letter for recent Intel AMX patches has a summary of the x86
> > > > > > > implementation:
> > > > > > >
> > > > > > > https://lore.kernel.org/lkml/20210825155413.19673-1-chang.seok.bae@intel.com/
> > > > > > >
> > > > > > > If RISC-V were to adopt a similar approach, I think the significant
> > > > > > > points are:
> > > > > > >
> > > > > > >   1. A process (or thread) must specifically request the desire to use
> > > > > > > vector extensions (perhaps with some new arch_prctl() API),
> > > > > > >
> > > > > > >   2. The kernel is free to deny permission, perhaps based on
> > > > > > > administrative rules or for other reasons, and
> > > > > > >
> > > > > > >   3. If a process attempts to use vector extensions before doing the
> > > > > > > above, the process will die due to an illegal instruction.
> > > > > >
> > > > > > Thank you for sharing this, but I am not sure if we should treat
> > > > > > vector like AMX on x86. IMHO, compiler might generate code with vector
> > > > > > instructions automatically someday, maybe we should treat vector
> > > > > > extensions like other extensions.
> > > > > > If user knows the vector extension is supported in this CPU and he
> > > > > > would like to use it, it seems we should let user use it directly just
> > > > > > like other extensions.
> > > > > > If user don't know it exists or not, user should use the library API
> > > > > > transparently and let glibc or other library deal with it. The glibc
> > > > > > ifunc feature or multi-lib should be able to choose the correct
> > > > > > implementation.
> > > > >
> > > > > What makes me think that the vector extension should be treated like AMX is
> > > > > that they both (1) have a significant amount of architectural state, and
> > > > > (2) likely have a significant power and/or area impact on (non-emulated)
> > > > > designs.
> > > > >
> > > > > For example, I think it is possible, maybe even likely, that vector
> > > > > implementations will have one or more of the following behaviors:
> > > > >
> > > > >   1. A single vector unit shared among two or more harts,
> > > > >
> > > > >   2. Additional power consumption when the vector unit is enabled and idle
> > > > > versus not being enabled at all,
> > > > >
> > > > >   3. For a system which supports variable operating frequency, a reduction
> > > > > in the maximum frequency when the vector unit is enabled, and/or
> > > > >
> > > > >   4. The inability to enter low power states and/or delays to low power
> > > > > states transitions when the vector unit is enabled.
> > > > >
> > > > > None of the above constraints apply to more ordinary extensions like
> > > > > compressed or the various bit manipulation extensions.
> > > > >
> > > > > The discussion I linked to has some well reasoned arguments on why
> > > > > substantial extensions should have a mechanism to request using them by
> > > > > user space.  The discussion was in the context of Intel AMX, but applies to
> > > > > further x86 extensions, and I think should also apply to similar extensions
> > > > > on RISC-V, like vector here.
> > > > >
> > > > There is possible use case where not all cores support vector
> > > > extension due to size, area and power.
> > > > Perhaps can have the mechanism or flow to determine the
> > > > application/thread require vector extension or it specifically request
> > > > the desire to use
> > > > vector extensions. Then this app/thread run on cpu with vector
> > > > extension capability only.
> > > >
> > >
> > > IIRC, we assume all harts has the same ability in Linux because of SMP
> > > assumption.
> > > If we have more information of hw capability and we may use this
> > > information for scheduler to switch the correct process to the correct
> > > CPU.
> > > Do you have any idea how to implement it in Linux kernel? Maybe we can
> > > list in the TODO list.
> > I think we can refer to other arch implementations as reference:
> >
> > 1. ARM64 supports 32-bit thread on asymmetric AArch32 systems. There
> > is a flag in ELF to check, then start the thread on the core that
> > supports 32-bit execution. This patchset is merged to mainline 5.15.
> > https://lore.kernel.org/linux-arm-kernel/20210730112443.23245-8-will@kernel.org/T/
>
> Wow! This is useful for AMP.
>
> >
> > 2. Link shared by Darius, on-demand request implementation on Intel AMX
> > https://lore.kernel.org/lkml/20210825155413.19673-1-chang.seok.bae@intel.com/
> >
> > glibc support optimized library functions with vector, this is enabled
> > by default if compiler is with vector extension enabled? If yes, then
> > most of the app required vector core.
>
> As I mentioned earlier, glibc ifunc will solve this issue. The
> Linux/glibc can run on platform with vector or without vector and
> glibc will use the information get from Linux kernel and using ifunc
> to decide whether it should use the vector version or not.
> Which means even your toolchain has vector glibc support and your
> Linux kernel told the glibc this platform doesn't support vector then
> the ifunc mechanism will choose the non-vector version ones.
Okay. Then Linux kernel needs to report vector capability as per core
feature, if not all SMP cores support vector.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-10-05 13:57           ` Darius Rad
@ 2021-10-21  1:01             ` Paul Walmsley
  2021-10-21 10:50               ` Darius Rad
  0 siblings, 1 reply; 55+ messages in thread
From: Paul Walmsley @ 2021-10-21  1:01 UTC (permalink / raw)
  To: Darius Rad
  Cc: Greentime Hu, linux-riscv, Linux Kernel Mailing List, Albert Ou,
	Palmer Dabbelt, Vincent Chen

[-- Attachment #1: Type: text/plain, Size: 2783 bytes --]

Hello Darius,

On Tue, 5 Oct 2021, Darius Rad wrote:

> On Mon, Oct 04, 2021 at 08:36:30PM +0800, Greentime Hu wrote:
> > Darius Rad <darius@bluespec.com> 於 2021年9月29日 週三 下午9:28寫道:
> > >
> > > On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > > > Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> > > > >
> > > > > On 9/8/21 1:45 PM, Greentime Hu wrote:
> > > > > > This patch adds task switch support for vector. It supports partial lazy
> > > > > > save and restore mechanism. It also supports all lengths of vlen.

[ ... ]

> > > > > So this will unconditionally enable vector instructions, and allocate
> > > > > memory for vector state, for all processes, regardless of whether vector
> > > > > instructions are used?
> > > >
> > > > Yes, it will enable vector if has_vector() is true. The reason that we
> > > > choose to enable and allocate memory for user space program is because
> > > > we also implement some common functions in the glibc such as memcpy
> > > > vector version and it is called very often by every process. So that
> > > > we assume if the user program is running in a CPU with vector ISA
> > > > would like to use vector by default. If we disable it by default and
> > > > make it trigger the illegal instruction, that might be a burden since
> > > > almost every process will use vector glibc memcpy or something like
> > > > that.
> > >
> > > Do you have any evidence to support the assertion that almost every process
> > > would use vector operations?  One could easily argue that the converse is
> > > true: no existing software uses the vector extension now, so most likely a
> > > process will not be using it.
> > 
> > Glibc ustreaming is just starting so you didn't see software using the 
> > vector extension now and this patchset is testing based on those 
> > optimized glibc too. Vincent Chen is working on the glibc vector 
> > support upstreaming and we will also upstream the vector version glibc 
> > memcpy, memcmp, memchr, memmove, memset, strcmp, strlen. Then we will 
> > see platform with vector support can use vector version mem* and str* 
> > functions automatically based on ifunc and platform without vector 
> > will use the original one automatically. These could be done to select 
> > the correct optimized glibc functions by ifunc mechanism.

In your reply, I noticed that you didn't address Greentime's response 
here.  But this looks like the key issue.  If common library functions are 
vector-accelerated, wouldn't it make sense that almost every process would 
wind up using vector instructions?  And thus there wouldn't be much point 
to skipping the vector context memory allocation?


- Paul

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-10-21  1:01             ` Paul Walmsley
@ 2021-10-21 10:50               ` Darius Rad
  2021-10-22  3:52                 ` Vincent Chen
  0 siblings, 1 reply; 55+ messages in thread
From: Darius Rad @ 2021-10-21 10:50 UTC (permalink / raw)
  To: Paul Walmsley
  Cc: Greentime Hu, linux-riscv, Linux Kernel Mailing List, Albert Ou,
	Palmer Dabbelt, Vincent Chen

On Wed, Oct 20, 2021 at 06:01:31PM -0700, Paul Walmsley wrote:
> Hello Darius,
> 
> On Tue, 5 Oct 2021, Darius Rad wrote:
> 
> > On Mon, Oct 04, 2021 at 08:36:30PM +0800, Greentime Hu wrote:
> > > Darius Rad <darius@bluespec.com> 於 2021年9月29日 週三 下午9:28寫道:
> > > >
> > > > On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > > > > Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> > > > > >
> > > > > > On 9/8/21 1:45 PM, Greentime Hu wrote:
> > > > > > > This patch adds task switch support for vector. It supports partial lazy
> > > > > > > save and restore mechanism. It also supports all lengths of vlen.
> 
> [ ... ]
> 
> > > > > > So this will unconditionally enable vector instructions, and allocate
> > > > > > memory for vector state, for all processes, regardless of whether vector
> > > > > > instructions are used?
> > > > >
> > > > > Yes, it will enable vector if has_vector() is true. The reason that we
> > > > > choose to enable and allocate memory for user space program is because
> > > > > we also implement some common functions in the glibc such as memcpy
> > > > > vector version and it is called very often by every process. So that
> > > > > we assume if the user program is running in a CPU with vector ISA
> > > > > would like to use vector by default. If we disable it by default and
> > > > > make it trigger the illegal instruction, that might be a burden since
> > > > > almost every process will use vector glibc memcpy or something like
> > > > > that.
> > > >
> > > > Do you have any evidence to support the assertion that almost every process
> > > > would use vector operations?  One could easily argue that the converse is
> > > > true: no existing software uses the vector extension now, so most likely a
> > > > process will not be using it.
> > > 
> > > Glibc ustreaming is just starting so you didn't see software using the 
> > > vector extension now and this patchset is testing based on those 
> > > optimized glibc too. Vincent Chen is working on the glibc vector 
> > > support upstreaming and we will also upstream the vector version glibc 
> > > memcpy, memcmp, memchr, memmove, memset, strcmp, strlen. Then we will 
> > > see platform with vector support can use vector version mem* and str* 
> > > functions automatically based on ifunc and platform without vector 
> > > will use the original one automatically. These could be done to select 
> > > the correct optimized glibc functions by ifunc mechanism.
> 
> In your reply, I noticed that you didn't address Greentime's response 
> here.  But this looks like the key issue.  If common library functions are 
> vector-accelerated, wouldn't it make sense that almost every process would 
> wind up using vector instructions?  And thus there wouldn't be much point 
> to skipping the vector context memory allocation?
> 

This issue was addressed in the thread regarding Intel AMX I linked to in a
previous message.  I don't agree that this is the key issue; it is one of a
number of issues.  What if I don't want to take the potential
power/frequency hit for the vector unit for a workload that, at best, uses
it for the occasional memcpy?  What if the allocation fails, how will that
get reported to user space (hint: not well)?  According to Greentime,
RISC-V vector is similar to ARM SVE, which allocates memory for context
state on first use and not unconditionally for all processes.

// darius


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-10-21 10:50               ` Darius Rad
@ 2021-10-22  3:52                 ` Vincent Chen
  2021-10-22 10:40                   ` Darius Rad
  0 siblings, 1 reply; 55+ messages in thread
From: Vincent Chen @ 2021-10-22  3:52 UTC (permalink / raw)
  To: Paul Walmsley, Greentime Hu, linux-riscv,
	Linux Kernel Mailing List, Albert Ou, Palmer Dabbelt,
	Vincent Chen

On Thu, Oct 21, 2021 at 6:50 PM Darius Rad <darius@bluespec.com> wrote:
>
> On Wed, Oct 20, 2021 at 06:01:31PM -0700, Paul Walmsley wrote:
> > Hello Darius,
> >
> > On Tue, 5 Oct 2021, Darius Rad wrote:
> >
> > > On Mon, Oct 04, 2021 at 08:36:30PM +0800, Greentime Hu wrote:
> > > > Darius Rad <darius@bluespec.com> 於 2021年9月29日 週三 下午9:28寫道:
> > > > >
> > > > > On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > > > > > Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> > > > > > >
> > > > > > > On 9/8/21 1:45 PM, Greentime Hu wrote:
> > > > > > > > This patch adds task switch support for vector. It supports partial lazy
> > > > > > > > save and restore mechanism. It also supports all lengths of vlen.
> >
> > [ ... ]
> >
> > > > > > > So this will unconditionally enable vector instructions, and allocate
> > > > > > > memory for vector state, for all processes, regardless of whether vector
> > > > > > > instructions are used?
> > > > > >
> > > > > > Yes, it will enable vector if has_vector() is true. The reason that we
> > > > > > choose to enable and allocate memory for user space program is because
> > > > > > we also implement some common functions in the glibc such as memcpy
> > > > > > vector version and it is called very often by every process. So that
> > > > > > we assume if the user program is running in a CPU with vector ISA
> > > > > > would like to use vector by default. If we disable it by default and
> > > > > > make it trigger the illegal instruction, that might be a burden since
> > > > > > almost every process will use vector glibc memcpy or something like
> > > > > > that.
> > > > >
> > > > > Do you have any evidence to support the assertion that almost every process
> > > > > would use vector operations?  One could easily argue that the converse is
> > > > > true: no existing software uses the vector extension now, so most likely a
> > > > > process will not be using it.
> > > >
> > > > Glibc ustreaming is just starting so you didn't see software using the
> > > > vector extension now and this patchset is testing based on those
> > > > optimized glibc too. Vincent Chen is working on the glibc vector
> > > > support upstreaming and we will also upstream the vector version glibc
> > > > memcpy, memcmp, memchr, memmove, memset, strcmp, strlen. Then we will
> > > > see platform with vector support can use vector version mem* and str*
> > > > functions automatically based on ifunc and platform without vector
> > > > will use the original one automatically. These could be done to select
> > > > the correct optimized glibc functions by ifunc mechanism.
> >
> > In your reply, I noticed that you didn't address Greentime's response
> > here.  But this looks like the key issue.  If common library functions are
> > vector-accelerated, wouldn't it make sense that almost every process would
> > wind up using vector instructions?  And thus there wouldn't be much point
> > to skipping the vector context memory allocation?
> >
>
> This issue was addressed in the thread regarding Intel AMX I linked to in a
> previous message.  I don't agree that this is the key issue; it is one of a
> number of issues.  What if I don't want to take the potential
> power/frequency hit for the vector unit for a workload that, at best, uses
> it for the occasional memcpy?  What if the allocation fails, how will that

Hi Darius,
The memcpy function seems not to be occasionally used in the programs
because many functions in Glibc use memcpy() to complete the memory
copy. I use the following simple case as an example.
test.c
void main(void) {
    return;
}
Then, we compile it by "gcc test.c -o a.out" and execute it. In the
execution, the memcpy() has been called unexpectedly. It is because
many libc initialized functions will be executed before entering the
user-defined main function. One of the example is __libc_setup_tls(),
which is called by __libc_start_main(). The __libc_setup_tls() will
use memcpy() during the process of creating the Dynamic Thread Vector
(DTV).

Therefore, I think the memcpy() is widely used in most programs.

> get reported to user space (hint: not well)?  According to Greentime,
> RISC-V vector is similar to ARM SVE, which allocates memory for context
> state on first use and not unconditionally for all processes.
>
> // darius
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-10-22  3:52                 ` Vincent Chen
@ 2021-10-22 10:40                   ` Darius Rad
  2021-10-25  4:47                     ` Greentime Hu
  2021-10-26 14:58                     ` Heiko Stübner
  0 siblings, 2 replies; 55+ messages in thread
From: Darius Rad @ 2021-10-22 10:40 UTC (permalink / raw)
  To: Vincent Chen
  Cc: Paul Walmsley, Greentime Hu, linux-riscv,
	Linux Kernel Mailing List, Albert Ou, Palmer Dabbelt

On Fri, Oct 22, 2021 at 11:52:01AM +0800, Vincent Chen wrote:
> On Thu, Oct 21, 2021 at 6:50 PM Darius Rad <darius@bluespec.com> wrote:
> >
> > On Wed, Oct 20, 2021 at 06:01:31PM -0700, Paul Walmsley wrote:
> > > Hello Darius,
> > >
> > > On Tue, 5 Oct 2021, Darius Rad wrote:
> > >
> > > > On Mon, Oct 04, 2021 at 08:36:30PM +0800, Greentime Hu wrote:
> > > > > Darius Rad <darius@bluespec.com> 於 2021年9月29日 週三 下午9:28寫道:
> > > > > >
> > > > > > On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > > > > > > Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> > > > > > > >
> > > > > > > > On 9/8/21 1:45 PM, Greentime Hu wrote:
> > > > > > > > > This patch adds task switch support for vector. It supports partial lazy
> > > > > > > > > save and restore mechanism. It also supports all lengths of vlen.
> > >
> > > [ ... ]
> > >
> > > > > > > > So this will unconditionally enable vector instructions, and allocate
> > > > > > > > memory for vector state, for all processes, regardless of whether vector
> > > > > > > > instructions are used?
> > > > > > >
> > > > > > > Yes, it will enable vector if has_vector() is true. The reason that we
> > > > > > > choose to enable and allocate memory for user space program is because
> > > > > > > we also implement some common functions in the glibc such as memcpy
> > > > > > > vector version and it is called very often by every process. So that
> > > > > > > we assume if the user program is running in a CPU with vector ISA
> > > > > > > would like to use vector by default. If we disable it by default and
> > > > > > > make it trigger the illegal instruction, that might be a burden since
> > > > > > > almost every process will use vector glibc memcpy or something like
> > > > > > > that.
> > > > > >
> > > > > > Do you have any evidence to support the assertion that almost every process
> > > > > > would use vector operations?  One could easily argue that the converse is
> > > > > > true: no existing software uses the vector extension now, so most likely a
> > > > > > process will not be using it.
> > > > >
> > > > > Glibc ustreaming is just starting so you didn't see software using the
> > > > > vector extension now and this patchset is testing based on those
> > > > > optimized glibc too. Vincent Chen is working on the glibc vector
> > > > > support upstreaming and we will also upstream the vector version glibc
> > > > > memcpy, memcmp, memchr, memmove, memset, strcmp, strlen. Then we will
> > > > > see platform with vector support can use vector version mem* and str*
> > > > > functions automatically based on ifunc and platform without vector
> > > > > will use the original one automatically. These could be done to select
> > > > > the correct optimized glibc functions by ifunc mechanism.
> > >
> > > In your reply, I noticed that you didn't address Greentime's response
> > > here.  But this looks like the key issue.  If common library functions are
> > > vector-accelerated, wouldn't it make sense that almost every process would
> > > wind up using vector instructions?  And thus there wouldn't be much point
> > > to skipping the vector context memory allocation?
> > >
> >
> > This issue was addressed in the thread regarding Intel AMX I linked to in a
> > previous message.  I don't agree that this is the key issue; it is one of a
> > number of issues.  What if I don't want to take the potential
> > power/frequency hit for the vector unit for a workload that, at best, uses
> > it for the occasional memcpy?  What if the allocation fails, how will that
> 
> Hi Darius,
> The memcpy function seems not to be occasionally used in the programs
> because many functions in Glibc use memcpy() to complete the memory
> copy. I use the following simple case as an example.
> test.c
> void main(void) {
>     return;
> }
> Then, we compile it by "gcc test.c -o a.out" and execute it. In the
> execution, the memcpy() has been called unexpectedly. It is because
> many libc initialized functions will be executed before entering the
> user-defined main function. One of the example is __libc_setup_tls(),
> which is called by __libc_start_main(). The __libc_setup_tls() will
> use memcpy() during the process of creating the Dynamic Thread Vector
> (DTV).
> 
> Therefore, I think the memcpy() is widely used in most programs.
> 

You're missing my point.  Not every (any?) program spends a majority of the
time doing memcpy(), and even if a program did, all of my points are still
valid.

Please read the discussion in the thread I referenced and the questions in
my prior message.

> > get reported to user space (hint: not well)?  According to Greentime,
> > RISC-V vector is similar to ARM SVE, which allocates memory for context
> > state on first use and not unconditionally for all processes.
> >

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-10-22 10:40                   ` Darius Rad
@ 2021-10-25  4:47                     ` Greentime Hu
  2021-10-25 16:22                       ` Darius Rad
  2021-10-26 14:58                     ` Heiko Stübner
  1 sibling, 1 reply; 55+ messages in thread
From: Greentime Hu @ 2021-10-25  4:47 UTC (permalink / raw)
  To: Vincent Chen, Paul Walmsley, Greentime Hu, linux-riscv,
	Linux Kernel Mailing List, Albert Ou, Palmer Dabbelt

Darius Rad <darius@bluespec.com> 於 2021年10月22日 週五 下午6:40寫道:
>
> On Fri, Oct 22, 2021 at 11:52:01AM +0800, Vincent Chen wrote:
> > On Thu, Oct 21, 2021 at 6:50 PM Darius Rad <darius@bluespec.com> wrote:
> > >
> > > On Wed, Oct 20, 2021 at 06:01:31PM -0700, Paul Walmsley wrote:
> > > > Hello Darius,
> > > >
> > > > On Tue, 5 Oct 2021, Darius Rad wrote:
> > > >
> > > > > On Mon, Oct 04, 2021 at 08:36:30PM +0800, Greentime Hu wrote:
> > > > > > Darius Rad <darius@bluespec.com> 於 2021年9月29日 週三 下午9:28寫道:
> > > > > > >
> > > > > > > On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > > > > > > > Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> > > > > > > > >
> > > > > > > > > On 9/8/21 1:45 PM, Greentime Hu wrote:
> > > > > > > > > > This patch adds task switch support for vector. It supports partial lazy
> > > > > > > > > > save and restore mechanism. It also supports all lengths of vlen.
> > > >
> > > > [ ... ]
> > > >
> > > > > > > > > So this will unconditionally enable vector instructions, and allocate
> > > > > > > > > memory for vector state, for all processes, regardless of whether vector
> > > > > > > > > instructions are used?
> > > > > > > >
> > > > > > > > Yes, it will enable vector if has_vector() is true. The reason that we
> > > > > > > > choose to enable and allocate memory for user space program is because
> > > > > > > > we also implement some common functions in the glibc such as memcpy
> > > > > > > > vector version and it is called very often by every process. So that
> > > > > > > > we assume if the user program is running in a CPU with vector ISA
> > > > > > > > would like to use vector by default. If we disable it by default and
> > > > > > > > make it trigger the illegal instruction, that might be a burden since
> > > > > > > > almost every process will use vector glibc memcpy or something like
> > > > > > > > that.
> > > > > > >
> > > > > > > Do you have any evidence to support the assertion that almost every process
> > > > > > > would use vector operations?  One could easily argue that the converse is
> > > > > > > true: no existing software uses the vector extension now, so most likely a
> > > > > > > process will not be using it.
> > > > > >
> > > > > > Glibc ustreaming is just starting so you didn't see software using the
> > > > > > vector extension now and this patchset is testing based on those
> > > > > > optimized glibc too. Vincent Chen is working on the glibc vector
> > > > > > support upstreaming and we will also upstream the vector version glibc
> > > > > > memcpy, memcmp, memchr, memmove, memset, strcmp, strlen. Then we will
> > > > > > see platform with vector support can use vector version mem* and str*
> > > > > > functions automatically based on ifunc and platform without vector
> > > > > > will use the original one automatically. These could be done to select
> > > > > > the correct optimized glibc functions by ifunc mechanism.
> > > >
> > > > In your reply, I noticed that you didn't address Greentime's response
> > > > here.  But this looks like the key issue.  If common library functions are
> > > > vector-accelerated, wouldn't it make sense that almost every process would
> > > > wind up using vector instructions?  And thus there wouldn't be much point
> > > > to skipping the vector context memory allocation?
> > > >
> > >
> > > This issue was addressed in the thread regarding Intel AMX I linked to in a
> > > previous message.  I don't agree that this is the key issue; it is one of a
> > > number of issues.  What if I don't want to take the potential
> > > power/frequency hit for the vector unit for a workload that, at best, uses
> > > it for the occasional memcpy?  What if the allocation fails, how will that
> >
> > Hi Darius,
> > The memcpy function seems not to be occasionally used in the programs
> > because many functions in Glibc use memcpy() to complete the memory
> > copy. I use the following simple case as an example.
> > test.c
> > void main(void) {
> >     return;
> > }
> > Then, we compile it by "gcc test.c -o a.out" and execute it. In the
> > execution, the memcpy() has been called unexpectedly. It is because
> > many libc initialized functions will be executed before entering the
> > user-defined main function. One of the example is __libc_setup_tls(),
> > which is called by __libc_start_main(). The __libc_setup_tls() will
> > use memcpy() during the process of creating the Dynamic Thread Vector
> > (DTV).
> >
> > Therefore, I think the memcpy() is widely used in most programs.
> >
>
> You're missing my point.  Not every (any?) program spends a majority of the
> time doing memcpy(), and even if a program did, all of my points are still
> valid.
>
> Please read the discussion in the thread I referenced and the questions in
> my prior message.
>

Hi Darius,

As I mentioned before, we want to treat vector ISA like a general ISA
instead of a specific IP. User program should be able to use it
transparently just like FPU.
It seems that the use case you want is asking user to use vector like
a specific IP, user program should ask kernel before they use it and
that is not what we want to do in this patchset.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-10-25  4:47                     ` Greentime Hu
@ 2021-10-25 16:22                       ` Darius Rad
  2021-10-26  4:44                         ` Greentime Hu
  0 siblings, 1 reply; 55+ messages in thread
From: Darius Rad @ 2021-10-25 16:22 UTC (permalink / raw)
  To: Greentime Hu
  Cc: Vincent Chen, Paul Walmsley, linux-riscv,
	Linux Kernel Mailing List, Albert Ou, Palmer Dabbelt

On Mon, Oct 25, 2021 at 12:47:49PM +0800, Greentime Hu wrote:
> Darius Rad <darius@bluespec.com> 於 2021年10月22日 週五 下午6:40寫道:
> >
> > On Fri, Oct 22, 2021 at 11:52:01AM +0800, Vincent Chen wrote:
> > > On Thu, Oct 21, 2021 at 6:50 PM Darius Rad <darius@bluespec.com> wrote:
> > > >
> > > > On Wed, Oct 20, 2021 at 06:01:31PM -0700, Paul Walmsley wrote:
> > > > > Hello Darius,
> > > > >
> > > > > On Tue, 5 Oct 2021, Darius Rad wrote:
> > > > >
> > > > > > On Mon, Oct 04, 2021 at 08:36:30PM +0800, Greentime Hu wrote:
> > > > > > > Darius Rad <darius@bluespec.com> 於 2021年9月29日 週三 下午9:28寫道:
> > > > > > > >
> > > > > > > > On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > > > > > > > > Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> > > > > > > > > >
> > > > > > > > > > On 9/8/21 1:45 PM, Greentime Hu wrote:
> > > > > > > > > > > This patch adds task switch support for vector. It supports partial lazy
> > > > > > > > > > > save and restore mechanism. It also supports all lengths of vlen.
> > > > >
> > > > > [ ... ]
> > > > >
> > > > > > > > > > So this will unconditionally enable vector instructions, and allocate
> > > > > > > > > > memory for vector state, for all processes, regardless of whether vector
> > > > > > > > > > instructions are used?
> > > > > > > > >
> > > > > > > > > Yes, it will enable vector if has_vector() is true. The reason that we
> > > > > > > > > choose to enable and allocate memory for user space program is because
> > > > > > > > > we also implement some common functions in the glibc such as memcpy
> > > > > > > > > vector version and it is called very often by every process. So that
> > > > > > > > > we assume if the user program is running in a CPU with vector ISA
> > > > > > > > > would like to use vector by default. If we disable it by default and
> > > > > > > > > make it trigger the illegal instruction, that might be a burden since
> > > > > > > > > almost every process will use vector glibc memcpy or something like
> > > > > > > > > that.
> > > > > > > >
> > > > > > > > Do you have any evidence to support the assertion that almost every process
> > > > > > > > would use vector operations?  One could easily argue that the converse is
> > > > > > > > true: no existing software uses the vector extension now, so most likely a
> > > > > > > > process will not be using it.
> > > > > > >
> > > > > > > Glibc ustreaming is just starting so you didn't see software using the
> > > > > > > vector extension now and this patchset is testing based on those
> > > > > > > optimized glibc too. Vincent Chen is working on the glibc vector
> > > > > > > support upstreaming and we will also upstream the vector version glibc
> > > > > > > memcpy, memcmp, memchr, memmove, memset, strcmp, strlen. Then we will
> > > > > > > see platform with vector support can use vector version mem* and str*
> > > > > > > functions automatically based on ifunc and platform without vector
> > > > > > > will use the original one automatically. These could be done to select
> > > > > > > the correct optimized glibc functions by ifunc mechanism.
> > > > >
> > > > > In your reply, I noticed that you didn't address Greentime's response
> > > > > here.  But this looks like the key issue.  If common library functions are
> > > > > vector-accelerated, wouldn't it make sense that almost every process would
> > > > > wind up using vector instructions?  And thus there wouldn't be much point
> > > > > to skipping the vector context memory allocation?
> > > > >
> > > >
> > > > This issue was addressed in the thread regarding Intel AMX I linked to in a
> > > > previous message.  I don't agree that this is the key issue; it is one of a
> > > > number of issues.  What if I don't want to take the potential
> > > > power/frequency hit for the vector unit for a workload that, at best, uses
> > > > it for the occasional memcpy?  What if the allocation fails, how will that
> > >
> > > Hi Darius,
> > > The memcpy function seems not to be occasionally used in the programs
> > > because many functions in Glibc use memcpy() to complete the memory
> > > copy. I use the following simple case as an example.
> > > test.c
> > > void main(void) {
> > >     return;
> > > }
> > > Then, we compile it by "gcc test.c -o a.out" and execute it. In the
> > > execution, the memcpy() has been called unexpectedly. It is because
> > > many libc initialized functions will be executed before entering the
> > > user-defined main function. One of the example is __libc_setup_tls(),
> > > which is called by __libc_start_main(). The __libc_setup_tls() will
> > > use memcpy() during the process of creating the Dynamic Thread Vector
> > > (DTV).
> > >
> > > Therefore, I think the memcpy() is widely used in most programs.
> > >
> >
> > You're missing my point.  Not every (any?) program spends a majority of the
> > time doing memcpy(), and even if a program did, all of my points are still
> > valid.
> >
> > Please read the discussion in the thread I referenced and the questions in
> > my prior message.
> >
> 
> Hi Darius,
> 
> As I mentioned before, we want to treat vector ISA like a general ISA
> instead of a specific IP. User program should be able to use it
> transparently just like FPU.
> It seems that the use case you want is asking user to use vector like
> a specific IP, user program should ask kernel before they use it and
> that is not what we want to do in this patchset.
> 

Hi Greentime,

Right.

But beyond what I want to do or what you want to do, is what *should* Linux
do?  I have attempted to provide evidence to support my position.  You have
not responded to or addressed the majority of my questions, which is
concerning to me.

// darius


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-10-25 16:22                       ` Darius Rad
@ 2021-10-26  4:44                         ` Greentime Hu
  2021-10-27 12:58                           ` Darius Rad
  0 siblings, 1 reply; 55+ messages in thread
From: Greentime Hu @ 2021-10-26  4:44 UTC (permalink / raw)
  To: Greentime Hu, Vincent Chen, Paul Walmsley, linux-riscv,
	Linux Kernel Mailing List, Albert Ou, Palmer Dabbelt

Darius Rad <darius@bluespec.com> 於 2021年10月26日 週二 上午12:22寫道:
>
> On Mon, Oct 25, 2021 at 12:47:49PM +0800, Greentime Hu wrote:
> > Darius Rad <darius@bluespec.com> 於 2021年10月22日 週五 下午6:40寫道:
> > >
> > > On Fri, Oct 22, 2021 at 11:52:01AM +0800, Vincent Chen wrote:
> > > > On Thu, Oct 21, 2021 at 6:50 PM Darius Rad <darius@bluespec.com> wrote:
> > > > >
> > > > > On Wed, Oct 20, 2021 at 06:01:31PM -0700, Paul Walmsley wrote:
> > > > > > Hello Darius,
> > > > > >
> > > > > > On Tue, 5 Oct 2021, Darius Rad wrote:
> > > > > >
> > > > > > > On Mon, Oct 04, 2021 at 08:36:30PM +0800, Greentime Hu wrote:
> > > > > > > > Darius Rad <darius@bluespec.com> 於 2021年9月29日 週三 下午9:28寫道:
> > > > > > > > >
> > > > > > > > > On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > > > > > > > > > Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> > > > > > > > > > >
> > > > > > > > > > > On 9/8/21 1:45 PM, Greentime Hu wrote:
> > > > > > > > > > > > This patch adds task switch support for vector. It supports partial lazy
> > > > > > > > > > > > save and restore mechanism. It also supports all lengths of vlen.
> > > > > >
> > > > > > [ ... ]
> > > > > >
> > > > > > > > > > > So this will unconditionally enable vector instructions, and allocate
> > > > > > > > > > > memory for vector state, for all processes, regardless of whether vector
> > > > > > > > > > > instructions are used?
> > > > > > > > > >
> > > > > > > > > > Yes, it will enable vector if has_vector() is true. The reason that we
> > > > > > > > > > choose to enable and allocate memory for user space program is because
> > > > > > > > > > we also implement some common functions in the glibc such as memcpy
> > > > > > > > > > vector version and it is called very often by every process. So that
> > > > > > > > > > we assume if the user program is running in a CPU with vector ISA
> > > > > > > > > > would like to use vector by default. If we disable it by default and
> > > > > > > > > > make it trigger the illegal instruction, that might be a burden since
> > > > > > > > > > almost every process will use vector glibc memcpy or something like
> > > > > > > > > > that.
> > > > > > > > >
> > > > > > > > > Do you have any evidence to support the assertion that almost every process
> > > > > > > > > would use vector operations?  One could easily argue that the converse is
> > > > > > > > > true: no existing software uses the vector extension now, so most likely a
> > > > > > > > > process will not be using it.
> > > > > > > >
> > > > > > > > Glibc ustreaming is just starting so you didn't see software using the
> > > > > > > > vector extension now and this patchset is testing based on those
> > > > > > > > optimized glibc too. Vincent Chen is working on the glibc vector
> > > > > > > > support upstreaming and we will also upstream the vector version glibc
> > > > > > > > memcpy, memcmp, memchr, memmove, memset, strcmp, strlen. Then we will
> > > > > > > > see platform with vector support can use vector version mem* and str*
> > > > > > > > functions automatically based on ifunc and platform without vector
> > > > > > > > will use the original one automatically. These could be done to select
> > > > > > > > the correct optimized glibc functions by ifunc mechanism.
> > > > > >
> > > > > > In your reply, I noticed that you didn't address Greentime's response
> > > > > > here.  But this looks like the key issue.  If common library functions are
> > > > > > vector-accelerated, wouldn't it make sense that almost every process would
> > > > > > wind up using vector instructions?  And thus there wouldn't be much point
> > > > > > to skipping the vector context memory allocation?
> > > > > >
> > > > >
> > > > > This issue was addressed in the thread regarding Intel AMX I linked to in a
> > > > > previous message.  I don't agree that this is the key issue; it is one of a
> > > > > number of issues.  What if I don't want to take the potential
> > > > > power/frequency hit for the vector unit for a workload that, at best, uses
> > > > > it for the occasional memcpy?  What if the allocation fails, how will that
> > > >
> > > > Hi Darius,
> > > > The memcpy function seems not to be occasionally used in the programs
> > > > because many functions in Glibc use memcpy() to complete the memory
> > > > copy. I use the following simple case as an example.
> > > > test.c
> > > > void main(void) {
> > > >     return;
> > > > }
> > > > Then, we compile it by "gcc test.c -o a.out" and execute it. In the
> > > > execution, the memcpy() has been called unexpectedly. It is because
> > > > many libc initialized functions will be executed before entering the
> > > > user-defined main function. One of the example is __libc_setup_tls(),
> > > > which is called by __libc_start_main(). The __libc_setup_tls() will
> > > > use memcpy() during the process of creating the Dynamic Thread Vector
> > > > (DTV).
> > > >
> > > > Therefore, I think the memcpy() is widely used in most programs.
> > > >
> > >
> > > You're missing my point.  Not every (any?) program spends a majority of the
> > > time doing memcpy(), and even if a program did, all of my points are still
> > > valid.
> > >
> > > Please read the discussion in the thread I referenced and the questions in
> > > my prior message.
> > >
> >
> > Hi Darius,
> >
> > As I mentioned before, we want to treat vector ISA like a general ISA
> > instead of a specific IP. User program should be able to use it
> > transparently just like FPU.
> > It seems that the use case you want is asking user to use vector like
> > a specific IP, user program should ask kernel before they use it and
> > that is not what we want to do in this patchset.
> >
>
> Hi Greentime,
>
> Right.
>
> But beyond what I want to do or what you want to do, is what *should* Linux
> do?  I have attempted to provide evidence to support my position.  You have
> not responded to or addressed the majority of my questions, which is
> concerning to me.

Hi Darius,

What is your majority questions?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-10-22 10:40                   ` Darius Rad
  2021-10-25  4:47                     ` Greentime Hu
@ 2021-10-26 14:58                     ` Heiko Stübner
  1 sibling, 0 replies; 55+ messages in thread
From: Heiko Stübner @ 2021-10-26 14:58 UTC (permalink / raw)
  To: Vincent Chen, Paul Walmsley, Greentime Hu, linux-riscv,
	Linux Kernel Mailing List, Albert Ou, Palmer Dabbelt

Hi Darius,

Am Freitag, 22. Oktober 2021, 12:40:46 CEST schrieb Darius Rad:
> On Fri, Oct 22, 2021 at 11:52:01AM +0800, Vincent Chen wrote:
> > On Thu, Oct 21, 2021 at 6:50 PM Darius Rad <darius@bluespec.com> wrote:
> > >
> > > On Wed, Oct 20, 2021 at 06:01:31PM -0700, Paul Walmsley wrote:
> > > > Hello Darius,
> > > >
> > > > On Tue, 5 Oct 2021, Darius Rad wrote:
> > > >
> > > > > On Mon, Oct 04, 2021 at 08:36:30PM +0800, Greentime Hu wrote:
> > > > > > Darius Rad <darius@bluespec.com> 於 2021年9月29日 週三 下午9:28寫道:
> > > > > > >
> > > > > > > On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > > > > > > > Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> > > > > > > > >
> > > > > > > > > On 9/8/21 1:45 PM, Greentime Hu wrote:
> > > > > > > > > > This patch adds task switch support for vector. It supports partial lazy
> > > > > > > > > > save and restore mechanism. It also supports all lengths of vlen.
> > > >
> > > > [ ... ]
> > > >
> > > > > > > > > So this will unconditionally enable vector instructions, and allocate
> > > > > > > > > memory for vector state, for all processes, regardless of whether vector
> > > > > > > > > instructions are used?
> > > > > > > >
> > > > > > > > Yes, it will enable vector if has_vector() is true. The reason that we
> > > > > > > > choose to enable and allocate memory for user space program is because
> > > > > > > > we also implement some common functions in the glibc such as memcpy
> > > > > > > > vector version and it is called very often by every process. So that
> > > > > > > > we assume if the user program is running in a CPU with vector ISA
> > > > > > > > would like to use vector by default. If we disable it by default and
> > > > > > > > make it trigger the illegal instruction, that might be a burden since
> > > > > > > > almost every process will use vector glibc memcpy or something like
> > > > > > > > that.
> > > > > > >
> > > > > > > Do you have any evidence to support the assertion that almost every process
> > > > > > > would use vector operations?  One could easily argue that the converse is
> > > > > > > true: no existing software uses the vector extension now, so most likely a
> > > > > > > process will not be using it.
> > > > > >
> > > > > > Glibc ustreaming is just starting so you didn't see software using the
> > > > > > vector extension now and this patchset is testing based on those
> > > > > > optimized glibc too. Vincent Chen is working on the glibc vector
> > > > > > support upstreaming and we will also upstream the vector version glibc
> > > > > > memcpy, memcmp, memchr, memmove, memset, strcmp, strlen. Then we will
> > > > > > see platform with vector support can use vector version mem* and str*
> > > > > > functions automatically based on ifunc and platform without vector
> > > > > > will use the original one automatically. These could be done to select
> > > > > > the correct optimized glibc functions by ifunc mechanism.
> > > >
> > > > In your reply, I noticed that you didn't address Greentime's response
> > > > here.  But this looks like the key issue.  If common library functions are
> > > > vector-accelerated, wouldn't it make sense that almost every process would
> > > > wind up using vector instructions?  And thus there wouldn't be much point
> > > > to skipping the vector context memory allocation?
> > > >
> > >
> > > This issue was addressed in the thread regarding Intel AMX I linked to in a
> > > previous message.  I don't agree that this is the key issue; it is one of a
> > > number of issues.  What if I don't want to take the potential
> > > power/frequency hit for the vector unit for a workload that, at best, uses
> > > it for the occasional memcpy?  What if the allocation fails, how will that
> > 
> > Hi Darius,
> > The memcpy function seems not to be occasionally used in the programs
> > because many functions in Glibc use memcpy() to complete the memory
> > copy. I use the following simple case as an example.
> > test.c
> > void main(void) {
> >     return;
> > }
> > Then, we compile it by "gcc test.c -o a.out" and execute it. In the
> > execution, the memcpy() has been called unexpectedly. It is because
> > many libc initialized functions will be executed before entering the
> > user-defined main function. One of the example is __libc_setup_tls(),
> > which is called by __libc_start_main(). The __libc_setup_tls() will
> > use memcpy() during the process of creating the Dynamic Thread Vector
> > (DTV).
> > 
> > Therefore, I think the memcpy() is widely used in most programs.
> > 
> 
> You're missing my point.  Not every (any?) program spends a majority of the
> time doing memcpy(), and even if a program did, all of my points are still
> valid.
> 
> Please read the discussion in the thread I referenced and the questions in
> my prior message.

for people reading along at home, do have a different link by chance?

I.e. the link to
	https://lore.kernel.org/lkml/CALCETrW2QHa2TLvnUuVxAAheqcbSZ-5_WRXtDSAGcbG8N+gtdQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/
is not a known message-id on lore.kernel.org it seems.


Thanks
Heiko

> > > get reported to user space (hint: not well)?  According to Greentime,
> > > RISC-V vector is similar to ARM SVE, which allocates memory for context
> > > state on first use and not unconditionally for all processes.
> > >
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 





^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-10-26  4:44                         ` Greentime Hu
@ 2021-10-27 12:58                           ` Darius Rad
  2021-11-09  9:49                             ` Greentime Hu
  0 siblings, 1 reply; 55+ messages in thread
From: Darius Rad @ 2021-10-27 12:58 UTC (permalink / raw)
  To: Greentime Hu
  Cc: Vincent Chen, Paul Walmsley, linux-riscv,
	Linux Kernel Mailing List, Albert Ou, Palmer Dabbelt

On Tue, Oct 26, 2021 at 12:44:31PM +0800, Greentime Hu wrote:
> Darius Rad <darius@bluespec.com> 於 2021年10月26日 週二 上午12:22寫道:
> >
> > On Mon, Oct 25, 2021 at 12:47:49PM +0800, Greentime Hu wrote:
> > > Darius Rad <darius@bluespec.com> 於 2021年10月22日 週五 下午6:40寫道:
> > > >
> > > > On Fri, Oct 22, 2021 at 11:52:01AM +0800, Vincent Chen wrote:
> > > > > On Thu, Oct 21, 2021 at 6:50 PM Darius Rad <darius@bluespec.com> wrote:
> > > > > >
> > > > > > On Wed, Oct 20, 2021 at 06:01:31PM -0700, Paul Walmsley wrote:
> > > > > > > Hello Darius,
> > > > > > >
> > > > > > > On Tue, 5 Oct 2021, Darius Rad wrote:
> > > > > > >
> > > > > > > > On Mon, Oct 04, 2021 at 08:36:30PM +0800, Greentime Hu wrote:
> > > > > > > > > Darius Rad <darius@bluespec.com> 於 2021年9月29日 週三 下午9:28寫道:
> > > > > > > > > >
> > > > > > > > > > On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > > > > > > > > > > Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> > > > > > > > > > > >
> > > > > > > > > > > > On 9/8/21 1:45 PM, Greentime Hu wrote:
> > > > > > > > > > > > > This patch adds task switch support for vector. It supports partial lazy
> > > > > > > > > > > > > save and restore mechanism. It also supports all lengths of vlen.
> > > > > > >
> > > > > > > [ ... ]
> > > > > > >
> > > > > > > > > > > > So this will unconditionally enable vector instructions, and allocate
> > > > > > > > > > > > memory for vector state, for all processes, regardless of whether vector
> > > > > > > > > > > > instructions are used?
> > > > > > > > > > >
> > > > > > > > > > > Yes, it will enable vector if has_vector() is true. The reason that we
> > > > > > > > > > > choose to enable and allocate memory for user space program is because
> > > > > > > > > > > we also implement some common functions in the glibc such as memcpy
> > > > > > > > > > > vector version and it is called very often by every process. So that
> > > > > > > > > > > we assume if the user program is running in a CPU with vector ISA
> > > > > > > > > > > would like to use vector by default. If we disable it by default and
> > > > > > > > > > > make it trigger the illegal instruction, that might be a burden since
> > > > > > > > > > > almost every process will use vector glibc memcpy or something like
> > > > > > > > > > > that.
> > > > > > > > > >
> > > > > > > > > > Do you have any evidence to support the assertion that almost every process
> > > > > > > > > > would use vector operations?  One could easily argue that the converse is
> > > > > > > > > > true: no existing software uses the vector extension now, so most likely a
> > > > > > > > > > process will not be using it.
> > > > > > > > >
> > > > > > > > > Glibc ustreaming is just starting so you didn't see software using the
> > > > > > > > > vector extension now and this patchset is testing based on those
> > > > > > > > > optimized glibc too. Vincent Chen is working on the glibc vector
> > > > > > > > > support upstreaming and we will also upstream the vector version glibc
> > > > > > > > > memcpy, memcmp, memchr, memmove, memset, strcmp, strlen. Then we will
> > > > > > > > > see platform with vector support can use vector version mem* and str*
> > > > > > > > > functions automatically based on ifunc and platform without vector
> > > > > > > > > will use the original one automatically. These could be done to select
> > > > > > > > > the correct optimized glibc functions by ifunc mechanism.
> > > > > > >
> > > > > > > In your reply, I noticed that you didn't address Greentime's response
> > > > > > > here.  But this looks like the key issue.  If common library functions are
> > > > > > > vector-accelerated, wouldn't it make sense that almost every process would
> > > > > > > wind up using vector instructions?  And thus there wouldn't be much point
> > > > > > > to skipping the vector context memory allocation?
> > > > > > >
> > > > > >
> > > > > > This issue was addressed in the thread regarding Intel AMX I linked to in a
> > > > > > previous message.  I don't agree that this is the key issue; it is one of a
> > > > > > number of issues.  What if I don't want to take the potential
> > > > > > power/frequency hit for the vector unit for a workload that, at best, uses
> > > > > > it for the occasional memcpy?  What if the allocation fails, how will that
> > > > >
> > > > > Hi Darius,
> > > > > The memcpy function seems not to be occasionally used in the programs
> > > > > because many functions in Glibc use memcpy() to complete the memory
> > > > > copy. I use the following simple case as an example.
> > > > > test.c
> > > > > void main(void) {
> > > > >     return;
> > > > > }
> > > > > Then, we compile it by "gcc test.c -o a.out" and execute it. In the
> > > > > execution, the memcpy() has been called unexpectedly. It is because
> > > > > many libc initialized functions will be executed before entering the
> > > > > user-defined main function. One of the example is __libc_setup_tls(),
> > > > > which is called by __libc_start_main(). The __libc_setup_tls() will
> > > > > use memcpy() during the process of creating the Dynamic Thread Vector
> > > > > (DTV).
> > > > >
> > > > > Therefore, I think the memcpy() is widely used in most programs.
> > > > >
> > > >
> > > > You're missing my point.  Not every (any?) program spends a majority of the
> > > > time doing memcpy(), and even if a program did, all of my points are still
> > > > valid.
> > > >
> > > > Please read the discussion in the thread I referenced and the questions in
> > > > my prior message.
> > > >
> > >
> > > Hi Darius,
> > >
> > > As I mentioned before, we want to treat vector ISA like a general ISA
> > > instead of a specific IP. User program should be able to use it
> > > transparently just like FPU.
> > > It seems that the use case you want is asking user to use vector like
> > > a specific IP, user program should ask kernel before they use it and
> > > that is not what we want to do in this patchset.
> > >
> >
> > Hi Greentime,
> >
> > Right.
> >
> > But beyond what I want to do or what you want to do, is what *should* Linux
> > do?  I have attempted to provide evidence to support my position.  You have
> > not responded to or addressed the majority of my questions, which is
> > concerning to me.
> 
> Hi Darius,
> 
> What is your majority questions?
> 

1. How will memory allocation failures for context state memory be reported
to user space?

2. How will a system administrator (i.e., the user) be able to effectively
manage a system where the vector unit, which could have a considerable area
and/or power impact to the system, has one or more of the following
properties:

  a. A single vector unit shared among two or more harts,

  b. Additional power consumption when the vector unit is enabled and idle
versus not being enabled at all,

  c. For a system which supports variable operating frequency, a reduction
in the maximum frequency when the vector unit is enabled, and/or

  d. The inability to enter low power states and/or delays to low power
states transitions when the vector unit is enabled.

3. You contend that the RISC-V V-extension resembles ARM SVE/SVE2, at least
more than Intel AMX.  I do not agree, but nevertheless, why then does this
patchset not behave similar to SVE?  On arm64, SVE is only enabled and
memory is only allocated on first use, *not* unconditionally for all tasks.

// darius


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-10-27 12:58                           ` Darius Rad
@ 2021-11-09  9:49                             ` Greentime Hu
  2021-11-09 19:21                               ` Darius Rad
  0 siblings, 1 reply; 55+ messages in thread
From: Greentime Hu @ 2021-11-09  9:49 UTC (permalink / raw)
  To: Greentime Hu, Vincent Chen, Paul Walmsley, linux-riscv,
	Linux Kernel Mailing List, Albert Ou, Palmer Dabbelt

Darius Rad <darius@bluespec.com> 於 2021年10月27日 週三 下午8:58寫道:
>
> On Tue, Oct 26, 2021 at 12:44:31PM +0800, Greentime Hu wrote:
> > Darius Rad <darius@bluespec.com> 於 2021年10月26日 週二 上午12:22寫道:
> > >
> > > On Mon, Oct 25, 2021 at 12:47:49PM +0800, Greentime Hu wrote:
> > > > Darius Rad <darius@bluespec.com> 於 2021年10月22日 週五 下午6:40寫道:
> > > > >
> > > > > On Fri, Oct 22, 2021 at 11:52:01AM +0800, Vincent Chen wrote:
> > > > > > On Thu, Oct 21, 2021 at 6:50 PM Darius Rad <darius@bluespec.com> wrote:
> > > > > > >
> > > > > > > On Wed, Oct 20, 2021 at 06:01:31PM -0700, Paul Walmsley wrote:
> > > > > > > > Hello Darius,
> > > > > > > >
> > > > > > > > On Tue, 5 Oct 2021, Darius Rad wrote:
> > > > > > > >
> > > > > > > > > On Mon, Oct 04, 2021 at 08:36:30PM +0800, Greentime Hu wrote:
> > > > > > > > > > Darius Rad <darius@bluespec.com> 於 2021年9月29日 週三 下午9:28寫道:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > > > > > > > > > > > Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On 9/8/21 1:45 PM, Greentime Hu wrote:
> > > > > > > > > > > > > > This patch adds task switch support for vector. It supports partial lazy
> > > > > > > > > > > > > > save and restore mechanism. It also supports all lengths of vlen.
> > > > > > > >
> > > > > > > > [ ... ]
> > > > > > > >
> > > > > > > > > > > > > So this will unconditionally enable vector instructions, and allocate
> > > > > > > > > > > > > memory for vector state, for all processes, regardless of whether vector
> > > > > > > > > > > > > instructions are used?
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, it will enable vector if has_vector() is true. The reason that we
> > > > > > > > > > > > choose to enable and allocate memory for user space program is because
> > > > > > > > > > > > we also implement some common functions in the glibc such as memcpy
> > > > > > > > > > > > vector version and it is called very often by every process. So that
> > > > > > > > > > > > we assume if the user program is running in a CPU with vector ISA
> > > > > > > > > > > > would like to use vector by default. If we disable it by default and
> > > > > > > > > > > > make it trigger the illegal instruction, that might be a burden since
> > > > > > > > > > > > almost every process will use vector glibc memcpy or something like
> > > > > > > > > > > > that.
> > > > > > > > > > >
> > > > > > > > > > > Do you have any evidence to support the assertion that almost every process
> > > > > > > > > > > would use vector operations?  One could easily argue that the converse is
> > > > > > > > > > > true: no existing software uses the vector extension now, so most likely a
> > > > > > > > > > > process will not be using it.
> > > > > > > > > >
> > > > > > > > > > Glibc ustreaming is just starting so you didn't see software using the
> > > > > > > > > > vector extension now and this patchset is testing based on those
> > > > > > > > > > optimized glibc too. Vincent Chen is working on the glibc vector
> > > > > > > > > > support upstreaming and we will also upstream the vector version glibc
> > > > > > > > > > memcpy, memcmp, memchr, memmove, memset, strcmp, strlen. Then we will
> > > > > > > > > > see platform with vector support can use vector version mem* and str*
> > > > > > > > > > functions automatically based on ifunc and platform without vector
> > > > > > > > > > will use the original one automatically. These could be done to select
> > > > > > > > > > the correct optimized glibc functions by ifunc mechanism.
> > > > > > > >
> > > > > > > > In your reply, I noticed that you didn't address Greentime's response
> > > > > > > > here.  But this looks like the key issue.  If common library functions are
> > > > > > > > vector-accelerated, wouldn't it make sense that almost every process would
> > > > > > > > wind up using vector instructions?  And thus there wouldn't be much point
> > > > > > > > to skipping the vector context memory allocation?
> > > > > > > >
> > > > > > >
> > > > > > > This issue was addressed in the thread regarding Intel AMX I linked to in a
> > > > > > > previous message.  I don't agree that this is the key issue; it is one of a
> > > > > > > number of issues.  What if I don't want to take the potential
> > > > > > > power/frequency hit for the vector unit for a workload that, at best, uses
> > > > > > > it for the occasional memcpy?  What if the allocation fails, how will that
> > > > > >
> > > > > > Hi Darius,
> > > > > > The memcpy function seems not to be occasionally used in the programs
> > > > > > because many functions in Glibc use memcpy() to complete the memory
> > > > > > copy. I use the following simple case as an example.
> > > > > > test.c
> > > > > > void main(void) {
> > > > > >     return;
> > > > > > }
> > > > > > Then, we compile it by "gcc test.c -o a.out" and execute it. In the
> > > > > > execution, the memcpy() has been called unexpectedly. It is because
> > > > > > many libc initialized functions will be executed before entering the
> > > > > > user-defined main function. One of the example is __libc_setup_tls(),
> > > > > > which is called by __libc_start_main(). The __libc_setup_tls() will
> > > > > > use memcpy() during the process of creating the Dynamic Thread Vector
> > > > > > (DTV).
> > > > > >
> > > > > > Therefore, I think the memcpy() is widely used in most programs.
> > > > > >
> > > > >
> > > > > You're missing my point.  Not every (any?) program spends a majority of the
> > > > > time doing memcpy(), and even if a program did, all of my points are still
> > > > > valid.
> > > > >
> > > > > Please read the discussion in the thread I referenced and the questions in
> > > > > my prior message.
> > > > >
> > > >
> > > > Hi Darius,
> > > >
> > > > As I mentioned before, we want to treat vector ISA like a general ISA
> > > > instead of a specific IP. User program should be able to use it
> > > > transparently just like FPU.
> > > > It seems that the use case you want is asking user to use vector like
> > > > a specific IP, user program should ask kernel before they use it and
> > > > that is not what we want to do in this patchset.
> > > >
> > >
> > > Hi Greentime,
> > >
> > > Right.
> > >
> > > But beyond what I want to do or what you want to do, is what *should* Linux
> > > do?  I have attempted to provide evidence to support my position.  You have
> > > not responded to or addressed the majority of my questions, which is
> > > concerning to me.
> >
> > Hi Darius,
> >
> > What is your majority questions?
> >
>
> 1. How will memory allocation failures for context state memory be reported
> to user space?

it will return -ENOMEM for some cases or show the warning messages for
some cases.
We know it's not perfect, we should enhance that in the future, but
let's take an example: 256 bits vector length system. 256 bits * 32
registers /8 = 1KB.

> 2. How will a system administrator (i.e., the user) be able to effectively
> manage a system where the vector unit, which could have a considerable area
> and/or power impact to the system, has one or more of the following
> properties:

As I mentioned before,
We would like to let user use vector transparently just like FPU or
other extensions.
If user knows that this system supports vector and user uses intrinsic
functions or assembly codes or compiler generating vector codes, user
can just use it just like FPU.
If user doesn't know that whether this system support vector or not,
user can just use the glibc or ifunc in his own libraries to detect
vector support dynamically.

>   a. A single vector unit shared among two or more harts,
>
>   b. Additional power consumption when the vector unit is enabled and idle
> versus not being enabled at all,
>
>   c. For a system which supports variable operating frequency, a reduction
> in the maximum frequency when the vector unit is enabled, and/or
>
>   d. The inability to enter low power states and/or delays to low power
> states transitions when the vector unit is enabled.

We also don't support this kind of system(a single vector unit shared
among 2 or more harts) in this implementation. I'll add more
assumptions in the next version patches.
For the frequency or power issues, I'll also not treat them as a
special case since we want to treat vector like an normal extension of
ISA instead of a specific IP.

> 3. You contend that the RISC-V V-extension resembles ARM SVE/SVE2, at least
> more than Intel AMX.  I do not agree, but nevertheless, why then does this
> patchset not behave similar to SVE?  On arm64, SVE is only enabled and
> memory is only allocated on first use, *not* unconditionally for all tasks.

As we mentioned before, almost every user space program will use glibc
ld.so/libc.so and these libraries will use the vector version
optimization porting in a system with vector support.
That's why we don't let it trigger the first illegal instruction
exception because vector registers will be used at very beginning of
the program and the illegal instruction exception handling is also
heavier because we need to go to M-mode first than S-mode which means
we will need to save context and restore context twice in both M-mode
and S-mode. Since the vstate will be allocated eventually, why not
save these 2 context save/restore overhead in both M-mode and S-mode
for handling this illegal instruction exception. And if the system
doesn't support vector, glibc won't use the vector porting and the
Linux kernel won't allocate vstate for every process too.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
  2021-11-09  9:49                             ` Greentime Hu
@ 2021-11-09 19:21                               ` Darius Rad
  0 siblings, 0 replies; 55+ messages in thread
From: Darius Rad @ 2021-11-09 19:21 UTC (permalink / raw)
  To: Greentime Hu
  Cc: Vincent Chen, Paul Walmsley, linux-riscv,
	Linux Kernel Mailing List, Albert Ou, Palmer Dabbelt

On Tue, Nov 09, 2021 at 05:49:03PM +0800, Greentime Hu wrote:
> Darius Rad <darius@bluespec.com> 於 2021年10月27日 週三 下午8:58寫道:
> >
> > On Tue, Oct 26, 2021 at 12:44:31PM +0800, Greentime Hu wrote:
> > > Darius Rad <darius@bluespec.com> 於 2021年10月26日 週二 上午12:22寫道:
> > > >
> > > > On Mon, Oct 25, 2021 at 12:47:49PM +0800, Greentime Hu wrote:
> > > > > Darius Rad <darius@bluespec.com> 於 2021年10月22日 週五 下午6:40寫道:
> > > > > >
> > > > > > On Fri, Oct 22, 2021 at 11:52:01AM +0800, Vincent Chen wrote:
> > > > > > > On Thu, Oct 21, 2021 at 6:50 PM Darius Rad <darius@bluespec.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Oct 20, 2021 at 06:01:31PM -0700, Paul Walmsley wrote:
> > > > > > > > > Hello Darius,
> > > > > > > > >
> > > > > > > > > On Tue, 5 Oct 2021, Darius Rad wrote:
> > > > > > > > >
> > > > > > > > > > On Mon, Oct 04, 2021 at 08:36:30PM +0800, Greentime Hu wrote:
> > > > > > > > > > > Darius Rad <darius@bluespec.com> 於 2021年9月29日 週三 下午9:28寫道:
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > > > > > > > > > > > > Darius Rad <darius@bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On 9/8/21 1:45 PM, Greentime Hu wrote:
> > > > > > > > > > > > > > > This patch adds task switch support for vector. It supports partial lazy
> > > > > > > > > > > > > > > save and restore mechanism. It also supports all lengths of vlen.
> > > > > > > > >
> > > > > > > > > [ ... ]
> > > > > > > > >
> > > > > > > > > > > > > > So this will unconditionally enable vector instructions, and allocate
> > > > > > > > > > > > > > memory for vector state, for all processes, regardless of whether vector
> > > > > > > > > > > > > > instructions are used?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, it will enable vector if has_vector() is true. The reason that we
> > > > > > > > > > > > > choose to enable and allocate memory for user space program is because
> > > > > > > > > > > > > we also implement some common functions in the glibc such as memcpy
> > > > > > > > > > > > > vector version and it is called very often by every process. So that
> > > > > > > > > > > > > we assume if the user program is running in a CPU with vector ISA
> > > > > > > > > > > > > would like to use vector by default. If we disable it by default and
> > > > > > > > > > > > > make it trigger the illegal instruction, that might be a burden since
> > > > > > > > > > > > > almost every process will use vector glibc memcpy or something like
> > > > > > > > > > > > > that.
> > > > > > > > > > > >
> > > > > > > > > > > > Do you have any evidence to support the assertion that almost every process
> > > > > > > > > > > > would use vector operations?  One could easily argue that the converse is
> > > > > > > > > > > > true: no existing software uses the vector extension now, so most likely a
> > > > > > > > > > > > process will not be using it.
> > > > > > > > > > >
> > > > > > > > > > > Glibc ustreaming is just starting so you didn't see software using the
> > > > > > > > > > > vector extension now and this patchset is testing based on those
> > > > > > > > > > > optimized glibc too. Vincent Chen is working on the glibc vector
> > > > > > > > > > > support upstreaming and we will also upstream the vector version glibc
> > > > > > > > > > > memcpy, memcmp, memchr, memmove, memset, strcmp, strlen. Then we will
> > > > > > > > > > > see platform with vector support can use vector version mem* and str*
> > > > > > > > > > > functions automatically based on ifunc and platform without vector
> > > > > > > > > > > will use the original one automatically. These could be done to select
> > > > > > > > > > > the correct optimized glibc functions by ifunc mechanism.
> > > > > > > > >
> > > > > > > > > In your reply, I noticed that you didn't address Greentime's response
> > > > > > > > > here.  But this looks like the key issue.  If common library functions are
> > > > > > > > > vector-accelerated, wouldn't it make sense that almost every process would
> > > > > > > > > wind up using vector instructions?  And thus there wouldn't be much point
> > > > > > > > > to skipping the vector context memory allocation?
> > > > > > > > >
> > > > > > > >
> > > > > > > > This issue was addressed in the thread regarding Intel AMX I linked to in a
> > > > > > > > previous message.  I don't agree that this is the key issue; it is one of a
> > > > > > > > number of issues.  What if I don't want to take the potential
> > > > > > > > power/frequency hit for the vector unit for a workload that, at best, uses
> > > > > > > > it for the occasional memcpy?  What if the allocation fails, how will that
> > > > > > >
> > > > > > > Hi Darius,
> > > > > > > The memcpy function seems not to be occasionally used in the programs
> > > > > > > because many functions in Glibc use memcpy() to complete the memory
> > > > > > > copy. I use the following simple case as an example.
> > > > > > > test.c
> > > > > > > void main(void) {
> > > > > > >     return;
> > > > > > > }
> > > > > > > Then, we compile it by "gcc test.c -o a.out" and execute it. In the
> > > > > > > execution, the memcpy() has been called unexpectedly. It is because
> > > > > > > many libc initialized functions will be executed before entering the
> > > > > > > user-defined main function. One of the example is __libc_setup_tls(),
> > > > > > > which is called by __libc_start_main(). The __libc_setup_tls() will
> > > > > > > use memcpy() during the process of creating the Dynamic Thread Vector
> > > > > > > (DTV).
> > > > > > >
> > > > > > > Therefore, I think the memcpy() is widely used in most programs.
> > > > > > >
> > > > > >
> > > > > > You're missing my point.  Not every (any?) program spends a majority of the
> > > > > > time doing memcpy(), and even if a program did, all of my points are still
> > > > > > valid.
> > > > > >
> > > > > > Please read the discussion in the thread I referenced and the questions in
> > > > > > my prior message.
> > > > > >
> > > > >
> > > > > Hi Darius,
> > > > >
> > > > > As I mentioned before, we want to treat vector ISA like a general ISA
> > > > > instead of a specific IP. User program should be able to use it
> > > > > transparently just like FPU.
> > > > > It seems that the use case you want is asking user to use vector like
> > > > > a specific IP, user program should ask kernel before they use it and
> > > > > that is not what we want to do in this patchset.
> > > > >
> > > >
> > > > Hi Greentime,
> > > >
> > > > Right.
> > > >
> > > > But beyond what I want to do or what you want to do, is what *should* Linux
> > > > do?  I have attempted to provide evidence to support my position.  You have
> > > > not responded to or addressed the majority of my questions, which is
> > > > concerning to me.
> > >
> > > Hi Darius,
> > >
> > > What is your majority questions?
> > >
> >
> > 1. How will memory allocation failures for context state memory be reported
> > to user space?
> 
> it will return -ENOMEM for some cases or show the warning messages for
> some cases.
> We know it's not perfect, we should enhance that in the future, but
> let's take an example: 256 bits vector length system. 256 bits * 32
> registers /8 = 1KB.

When you say "show the warning message", I assume you mean the kernel will
log a message, which is not reported to user space.  User space will only
see a process unexpectedly die.

As you say, that is not great, and could be done better.  I would be
interested in knowing how you think that could be improved without needing
a user space API, or why it will be acceptable to break the user space API
later.

> 
> > 2. How will a system administrator (i.e., the user) be able to effectively
> > manage a system where the vector unit, which could have a considerable area
> > and/or power impact to the system, has one or more of the following
> > properties:
> 
> As I mentioned before,
> We would like to let user use vector transparently just like FPU or
> other extensions.
> If user knows that this system supports vector and user uses intrinsic
> functions or assembly codes or compiler generating vector codes, user
> can just use it just like FPU.
> If user doesn't know that whether this system support vector or not,
> user can just use the glibc or ifunc in his own libraries to detect
> vector support dynamically.
> 
> >   a. A single vector unit shared among two or more harts,
> >
> >   b. Additional power consumption when the vector unit is enabled and idle
> > versus not being enabled at all,
> >
> >   c. For a system which supports variable operating frequency, a reduction
> > in the maximum frequency when the vector unit is enabled, and/or
> >
> >   d. The inability to enter low power states and/or delays to low power
> > states transitions when the vector unit is enabled.
> 
> We also don't support this kind of system(a single vector unit shared
> among 2 or more harts) in this implementation. I'll add more
> assumptions in the next version patches.
> For the frequency or power issues, I'll also not treat them as a
> special case since we want to treat vector like an normal extension of
> ISA instead of a specific IP.

The problem is that it will likely be impossible to support such systems
without changing the user space API.

If you add an API along the lines of what I suggested, even if there is not
initially support to completely handle such systems, that support could be
added in the future without change to user space.

If such an API is not in place now, that support cannot be added without
breaking user space.

> 
> > 3. You contend that the RISC-V V-extension resembles ARM SVE/SVE2, at least
> > more than Intel AMX.  I do not agree, but nevertheless, why then does this
> > patchset not behave similar to SVE?  On arm64, SVE is only enabled and
> > memory is only allocated on first use, *not* unconditionally for all tasks.
> 
> As we mentioned before, almost every user space program will use glibc

As I mentioned before, I do not agree that every user space program *will*
use vector.

glibc is not the only C library used in Linux.

Whether such support as you are alluding to gets accepted to glibc also
remains to be seen.

> ld.so/libc.so and these libraries will use the vector version
> optimization porting in a system with vector support.
> That's why we don't let it trigger the first illegal instruction
> exception because vector registers will be used at very beginning of
> the program and the illegal instruction exception handling is also
> heavier because we need to go to M-mode first than S-mode which means
> we will need to save context and restore context twice in both M-mode
> and S-mode. Since the vstate will be allocated eventually, why not
> save these 2 context save/restore overhead in both M-mode and S-mode
> for handling this illegal instruction exception. And if the system
> doesn't support vector, glibc won't use the vector porting and the
> Linux kernel won't allocate vstate for every process too.

This is not convincing.  You are saying that a single illegal instruction
exception, for a process that actively desires to use vector instructions,
is more heavy weight than an unconditional allocation of up to 256 kiB per
process, even for those processes that do not use vector.


^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2021-11-09 19:21 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-08 17:45 [RFC PATCH v8 00/21] riscv: Add vector ISA support Greentime Hu
2021-09-08 17:45 ` [RFC PATCH v8 01/21] riscv: Separate patch for cflags and aflags Greentime Hu
2021-09-08 17:45 ` [RFC PATCH v8 02/21] riscv: Rename __switch_to_aux -> fpu Greentime Hu
2021-09-08 17:45 ` [RFC PATCH v8 03/21] riscv: Extending cpufeature.c to detect V-extension Greentime Hu
2021-09-08 17:45 ` [RFC PATCH v8 04/21] riscv: Add new csr defines related to vector extension Greentime Hu
2021-09-08 17:45 ` [RFC PATCH v8 05/21] riscv: Add vector feature to compile Greentime Hu
2021-09-08 17:45 ` [RFC PATCH v8 06/21] riscv: Add has_vector/riscv_vsize to save vector features Greentime Hu
2021-09-08 17:45 ` [RFC PATCH v8 07/21] riscv: Reset vector register Greentime Hu
2021-09-08 17:45 ` [RFC PATCH v8 08/21] riscv: Add vector struct and assembler definitions Greentime Hu
2021-09-08 17:45 ` [RFC PATCH v8 09/21] riscv: Add task switch support for vector Greentime Hu
2021-09-13 12:21   ` Darius Rad
2021-09-28 14:56     ` Greentime Hu
2021-09-29 13:28       ` Darius Rad
2021-10-01  2:46         ` Ley Foon Tan
2021-10-04 12:41           ` Greentime Hu
2021-10-05  2:12             ` Ley Foon Tan
2021-10-05 15:46               ` Greentime Hu
2021-10-07 10:10                 ` Ley Foon Tan
2021-10-04 12:36         ` Greentime Hu
2021-10-05 13:57           ` Darius Rad
2021-10-21  1:01             ` Paul Walmsley
2021-10-21 10:50               ` Darius Rad
2021-10-22  3:52                 ` Vincent Chen
2021-10-22 10:40                   ` Darius Rad
2021-10-25  4:47                     ` Greentime Hu
2021-10-25 16:22                       ` Darius Rad
2021-10-26  4:44                         ` Greentime Hu
2021-10-27 12:58                           ` Darius Rad
2021-11-09  9:49                             ` Greentime Hu
2021-11-09 19:21                               ` Darius Rad
2021-10-26 14:58                     ` Heiko Stübner
2021-09-08 17:45 ` [RFC PATCH v8 10/21] riscv: Add ptrace vector support Greentime Hu
2021-09-08 17:45 ` [RFC PATCH v8 11/21] riscv: Add sigcontext save/restore for vector Greentime Hu
2021-09-30  2:37   ` Ley Foon Tan
2021-09-08 17:45 ` [RFC PATCH v8 12/21] riscv: signal: Report signal frame size to userspace via auxv Greentime Hu
2021-09-08 17:45 ` [RFC PATCH v8 13/21] riscv: Add support for kernel mode vector Greentime Hu
2021-09-09  6:17   ` Christoph Hellwig
2021-09-08 17:45 ` [RFC PATCH v8 14/21] riscv: Use CSR_STATUS to replace sstatus in vector.S Greentime Hu
2021-09-08 17:45 ` [RFC PATCH v8 15/21] riscv: Add vector extension XOR implementation Greentime Hu
2021-09-09  6:12   ` Christoph Hellwig
2021-09-28  7:00     ` Greentime Hu
2021-09-14  8:29   ` Ley Foon Tan
2021-09-28  7:01     ` Greentime Hu
2021-09-08 17:45 ` [RFC PATCH v8 16/21] riscv: Initialize vector registers with proper vsetvli then it can work normally Greentime Hu
2021-09-08 17:45 ` [RFC PATCH v8 17/21] riscv: Optimize vector registers initialization Greentime Hu
2021-09-08 17:45 ` [RFC PATCH v8 18/21] riscv: Fix an illegal instruction exception when accessing vlenb without enable vector first Greentime Hu
2021-09-08 17:45 ` [RFC PATCH v8 19/21] riscv: Allocate space for vector registers in start_thread() Greentime Hu
2021-09-08 17:45 ` [RFC PATCH v8 20/21] riscv: Optimize task switch codes of vector Greentime Hu
2021-09-15 14:29   ` Jisheng Zhang
2021-10-04 14:13     ` Greentime Hu
2021-09-08 17:45 ` [RFC PATCH v8 21/21] riscv: Turn has_vector into a static key if VECTOR=y Greentime Hu
2021-09-15 14:24   ` Jisheng Zhang
2021-10-04 15:04     ` Greentime Hu
2021-09-13  1:47 ` [RFC PATCH v8 00/21] riscv: Add vector ISA support Vincent Chen
2021-09-13 17:18 ` Vineet Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).