LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions
@ 2021-10-01 22:36 Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 01/29] x86/fpu/xstate: Fix the state copy function to the XSTATE buffer Chang S. Bae
                   ` (31 more replies)
  0 siblings, 32 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:36 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

Intel Advanced Matrix Extensions (AMX)[1][2] will be shipping on servers
soon (Intel Sapphire Rapids).  AMX consists of configurable TMM "TILE"
registers plus new CPU instructions that operate on them.  TMUL (Tile
matrix MULtiply) is the first operator to take advantage of the new
registers, and we anticipate additional instructions in the future.

Neither AMX state nor TMUL instructions depend on AVX.  However, AMX and
AVX do share common challenges.  The TMM registers are 8KB today, and
architecturally as large as 64KB, which merit updates to hardware and
software state management.

Further, both technologies run faster when they are not simultaneously
running on SMT siblings, and both technologies use of power and bandwidth
impact the power and performance available to neighboring cores.  (This
impact has measurably improved in recent hardware.)

If the existing kernel approach for managing XSAVE state was employed to
handle AMX, 8KB space would be added to every task, but possibly rarely
used.  Thus, Linux implements on-demand expansion of per-task context
switch buffers using an XSAVE feature: eXtended Feature Disabling (XFD).
The kernel arms XFD to provide an #NM exception upon a tasks' first access
to TILE state. The kernel exception handler allocates and installs the
appropriate XSAVE context switch buffer.  User space is unaware of the
kernel's contexts switch buffer optimization.

AMX is accessible only to applications that invoke a new system call to
request access.  When a user invokes this system call, they agree that if
they use an alternate signal stack, that they are providing an alternative
signal stack of sufficient size.  The simplest way to do that is to use the
updated ABI in glibc 2.34 or later [8][9], though they could Also use their
own calculation or ask the kernel directly [3].

The patches are built on top of the recent upstream x86 FPU changes [13].

This series has three parts:
* Patch 01-16: Foundation to support dynamic user states
* Patch 17-22: AMX enablement
* Patch 23-29: Additional supplementary changes for optimization, test,
  debug and new syscalls.

Note that the per-process system call in PATCH14 reflects the latest
discussion on LKML, [10][12].

The following points summarize the latest discussion, and this
implementation:

1. Kernel sets XCR0.AMX=1 at boot, and leaves it set, always.

    XCR0 is NOT context switched by Linux.
    (If it were, every change would provoke VMEXIT if in VM.)

    (KVM context switches XCR0.   If KVM exports XFD for use by a guest OS,
    it must also context switch XFD.  KVM can not use XFD for its own
    purposes.)

2. Kernel arms XFD for all tasks.

    XFD is context switched per Linux task.

3. Apps invoke new system call to request feature access (AMX).

    Implemented as a flag to arch_prctl(2), permission granted to any task
    will grant that permission to all tasks in the process.

    It is sufficient to invoke this syscall at process or library
    init-time.

    There is no concept of removing or revoking permission, once granted to
    a process.  (Permission is cleared upon exec of a new process.)

    There is a companion system call to return the current permission.

    Analogous to AVX-512 and other stateful features, applications probe
    for AMX support by checking CPUID for the instructions and checking
    XGETBV(XCR0) for the OS support.

    However, stateful features from AMX onward also require the system call
    above to be invoked before tasks in that process may use the feature.

4. Applications touching AMX without permission results in process exit.

    Armed XFD results in #NM, results in SIGILL with si_code ILL_ILLOPC,
    typically resulting in process exit.

5. Applications touching AMX with permission allocate context switch buffer
   on-demand.

    Armed XFD results in #NM.
    Kernel allocates large context switch kernel buffer.
    Kernel dis-arms XFD for that task.

6. NM handler allocation failure results in process exit.

    If the #NM handler can not allocate the 8KB buffer, the task will
    receive a SIGILL with si_code ILL_ILLOPC at the instruction that took
    the #NM fault, typically resulting in process exit.

7. Legacy app signal stack XSTATE support includes AVX-512, and stops
   before AMX.

    Legacy apps are those which do not request AMX (or subsequent feature)
    access.The signal stack ABI continues to be uncompacted XSTATE for both
    legacy and new apps.

    Existing code to find offsets in XSTATE still work.
    Existing code doing XRSTOR/XSAVE on signal stack buffer will still
    work.*

    * XSTATE size calculation using CPUID will include
    AMX and other supported features, even if the process did not invoke
    the new system call.    However, the kernel will not XSAVE AMX or later
    features onto the signal stack of a legacy process.**

   ** User-space XSAVE/XRSTOR should size buffers according to CPUID
   if they include the bits of xgetbv(XCR0) in RFBM, because XSAVE will
   write data (including zeros for INIT state) for all features included in
   RFBM.

8. New opt-in apps must agree to provide large enough sigaltstack

    1. must invoke permission system call before touching AMX TMM
    2. must guarantee if using sigaltstack(2), that they have
       allocated signal stack of sufficient size, e.g., by utilizing
       glibc signal.h 2.34 or later.

    (glibc 2.34 changed MINSIGSTKSZ and SIGSTKSZ from 2KB/8KB constants
    into run-time routines. [8])

    Linux will continue to XSAVE/XRSTOR directly to/from the signal stack,
    and the stack will always include the 8KB *space* for AMX TMM and
    subsequent features.

    Linux has an optimization in for all XFD-supported features in the INIT
    state so that XSAVE will skip writing zeros.

9. intel_idle for SPR will clear AMX TMM state

    This guarantees that AMX use will not prevent the CPU from entering the
    idle C6 state, which can be beneficial for power savings, and thus
    turbo frequency.

Reviewed-by: Len Brown <len.brown@intel.com>

Changes from v10 [18]:
* Expand permission-required states to include XTILECFG and later, rather
  than just XFD-protected states (Patch15, Patch16, and Patch20).
* Add static allocation feature flag: ARCH_SET_STATE_ENABLE_ALLOC
  (Patch29).
* Add sanity checks to sigaltstack() and ARCH_SET_STATE_ENABLE syscalls
  (Patch16). (Thomas Glexiner)
* Update comment and a variable name for XSTATE calculation function
  (Patch9). (Dave Hansen)
* Raise SIGSEGV rather than SIGILL when XSTATE buffer reallocation fails
  (Patch13). (Thiago Macieira)
* Simplify the sigframe XSAVE code: replace check for XFD STATE with
  XTILECFG and later STATE (Patch23).
* Simplify syscall implementation - no functional change (Patch15).
* Fix the changelog and the comment for ARCH_SET_STATE_ENABLE (Patch15).
* Fix to access fpu->state_mask via the helper (Patch16).
* Update the selftest for v11 (Path24).

Changes from v9 [17]:
* Simplify and rename helpers for managing XSTATE buffer (Patch9,11).
  (Borislav Petkov)
* Simplify and use permission check helpers (Patch15,16).
* Remove access helpers (Patch6). (Borislav Petkov)
* Rename XSTATE address finder helper (Patch11). (Borislav Petkov)
* Simplify ptrace path code (Patch14). (Borislav Petkov)
* Use cpu_feature_enabled() whenever possible (Patch9,13,15,23,26,27).
  (Borislav Petkov)
* Add comment for tile_release() use (Patch26). (Dave Hansen)
* Update code comment and/or changelog (Patch6,7). (Borislav Petkov)
* Update the cover letter to indicate SPR explicitly (Patch0). (Dave
  Hansen)
* Update XFD enabling code (Patch13). (Borislav Petkov)
* Move the state copy function changes. (Patch1,9,12). (Borislav
  Petkov)

Changes from v8 [16]:
* Update arch_prctl prototype for consistency with other arch_prctl's.  It
  now takes an address of return bitmask as a parameter (Patch14).  Update
  self-tests to reflect this (Patch23).
* bugfix: Fix off-by-one-error in check_xstate_against_struct() feature
  number argument (Patch19).

Changes from v7 [15]:
* Update #NM handler to raise SIGILL rather than SIGSEGV (Patch 12).
  (Thiago Macieira)
* Rename the syscalls (Patch 14). (Thiago Macieira and Dave Hansen)
* If XSAVE is disabled, assure that syscall correctly indicates legacy
  states (Patch14). (Thiago Macieira and Dave Hansen)
* Update existing self-test to expect SIGILL (Patch23).

Changes from v6 [14]:
* Add state bitmap param to proposed syscall. (Thiago Macieira)
* Add companion syscall to return the current permission bitmap.
* Update the ptrace path to return EFAULT when no permission to write
  XTILEDATA.
* Simplify xstate size calculation code. (Dave Hansen)
* Update comments for TILERELEASE code. (Rafael J. Wysocki)

Changes from v5 [11]:
* Updated to require per-process permission for dynamic states (v5 was
  per-task).
* Support both legacy and expanded sigframe xstate buffer sizes.
* Moved the TILERELEASE code to intel_idle driver. (Peter Zijlstra)
* Fixed to deactivate fpregs with TILERELEASE. (Andy Lutomirski and Dave
  Hansen)
* Rebased on Thomas Gleixner's recent x86 FPU code changes.
* Added XFD sanity check. (Dave Hansen)
* Future proofed __raw_xsave_addr().
* Tighten up task size calculation (previously, it could over-calculate).
* Cleaned invocation memset() for init_fpstate (no functional change).
* Updated selftest to handle latest syscall semantics, plus minor updates.
* Dropped the change for XSTATE restore helper.

Changes from v4 [7]:
* Changed the buffer expansion policy to the access-request based approach
  from the transparent #NM-based approach. (Andy Lutomirski, Thomas
  Gleixner, and et al)
* Removed the boot parameter patch. (Thomas Gleixner)
* Included code to explicitly initialize AMX state during a context switch.
  (Thomas Gleixner)
* Added a new arch_prctl to pre-allocate a buffer for dynamic state. (Andy
  Lutomirski)
* Updated the fork() path to initialize all the AMX state.
* Improved ptracer's dynamic user state injection path.
* Add optimization to skip tile data in sigframe when an AMX thread
  initialized the state.
* Updated to treat the mismatched state size as an error. (Thomas Gleixner)
* Simplified the xstate feature check routine. (Thomas Gleixner)
* Simplified and updated the selftest.
* Updated some changelog. (Thomas Gleixner)
* Updated a function description. (Borislav Petkov)

Changes from v3 [6]:
* Updated some commit messages and code comments. (Borislav Petkov)
* Added and removed some helpers. (Borislav Petkov)
* Revised the buffer allocation function. (Borislav Petkov)
* Simplified in accessing buffers. (Borislav Petkov)
* Re-organized some code change more reviewable. (PATCH9/10)
* Reverted unnecessary changes. (PATCH4)
* Fixed typo in the documentation. (Randy Dunlap)

Changes from v2 [5]:
* Removed the patch for the tile data inheritance. Also, updated the
  selftest patch. (Andy Lutomirski)
* Changed the kernel tainted when any unknown state is enabled. (Andy
  Lutomirski)
* Changed to use the XFD feature only when the compacted format in use.
* Improved the test code.
* Simplified the cmdline handling.
* Removed 'task->fpu' in changelogs. (Boris Petkov)
* Updated the variable name / comments / changelogs for clarification.

Changes from v1 [4]:
* Added vmalloc() error tracing (Dave Hansen, PeterZ, and Andy Lutomirski)
* Inlined the #NM handling code (Andy Lutomirski)
* Made signal handling optimization revertible
* Revised the new parameter handling code (Andy Lutomirski and Dave Hansen)
* Rebased on the upstream kernel

[1]: Intel Architecture Instruction Set Extension Programming Reference
     May 2021, https://software.intel.com/content/dam/develop/external/us/en/documents-tps/architecture-instruction-set-extensions-programming-reference.pdf
[2]: https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/intrinsics/intrinsics-for-intel-advanced-matrix-extensions-intel-amx-instructions.html
[3]: https://lore.kernel.org/lkml/20210518200320.17239-1-chang.seok.bae@intel.com/
[4]: https://lore.kernel.org/lkml/20201001203913.9125-1-chang.seok.bae@intel.com/
[5]: https://lore.kernel.org/lkml/20201119233257.2939-1-chang.seok.bae@intel.com/
[6]: https://lore.kernel.org/lkml/20201223155717.19556-1-chang.seok.bae@intel.com/
[7]: https://lore.kernel.org/lkml/20210221185637.19281-1-chang.seok.bae@intel.com/
[8]: https://sourceware.org/git/?p=glibc.git;a=commit;h=6c57d320484988e87e446e2e60ce42816bf51d53
[9]: https://sourceware.org/git/?p=glibc.git;a=blob;f=NEWS;h=aa0f10a891f8f9b4e6f0f6d25b6a307898c07d82;hb=HEAD#l12
[10]: https://lore.kernel.org/lkml/CALCETrW2QHa2TLvnUuVxAAheqcbSZ-5_WRXtDSAGcbG8N+gtdQ@mail.gmail.com/
[11]: https://lore.kernel.org/lkml/20210523193259.26200-1-chang.seok.bae@intel.com/
[12]: https://lore.kernel.org/lkml/CAJvTdKmzN0VMyH8VU_fdzn2UZqmR=_aNrJW01a65BhyLm6YRPg@mail.gmail.com/
[13]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1423e2660cf134a8f21f2451865a04792013e49e
[14]: https://lore.kernel.org/lkml/20210630060226.24652-1-chang.seok.bae@intel.com/
[15]: https://lore.kernel.org/lkml/20210710130313.5072-1-chang.seok.bae@intel.com/
[16]: https://lore.kernel.org/lkml/20210717152903.7651-1-chang.seok.bae@intel.com/
[17]: https://lore.kernel.org/lkml/20210730145957.7927-1-chang.seok.bae@intel.com/
[18]: https://lore.kernel.org/lkml/20210825155413.19673-1-chang.seok.bae@intel.com/

Chang S. Bae (29):
  x86/fpu/xstate: Fix the state copy function to the XSTATE buffer
  x86/fpu/xstate: Modify the initialization helper to handle both static
    and dynamic buffers
  x86/fpu/xstate: Modify state copy helpers to handle both static and
    dynamic buffers
  x86/fpu/xstate: Modify address finders to handle both static and
    dynamic buffers
  x86/fpu/xstate: Add a new variable to indicate dynamic user states
  x86/fpu/xstate: Add new variables to indicate dynamic XSTATE buffer
    size
  x86/fpu/xstate: Calculate and remember dynamic XSTATE buffer sizes
  x86/fpu/xstate: Convert the struct fpu 'state' field to a pointer
  x86/fpu/xstate: Introduce helpers to manage the XSTATE buffer
    dynamically
  x86/fpu/xstate: Update the XSTATE save function to support dynamic
    states
  x86/fpu/xstate: Update the XSTATE buffer address finder to support
    dynamic states
  x86/fpu/xstate: Update the XSTATE context copy function to support
    dynamic states
  x86/fpu/xstate: Use feature disable (XFD) to protect dynamic user
    state
  x86/fpu/xstate: Support ptracer-induced XSTATE buffer expansion
  x86/arch_prctl: Create ARCH_SET_STATE_ENABLE/ARCH_GET_STATE_ENABLE
  x86/fpu/xstate: Support both legacy and expanded signal XSTATE size
  x86/fpu/xstate: Adjust the XSAVE feature table to address gaps in
    state component numbers
  x86/fpu/xstate: Disable XSTATE support if an inconsistent state is
    detected
  x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature
    bits
  x86/fpu/amx: Define AMX state components and have it used for
    boot-time checks
  x86/fpu/amx: Initialize child's AMX state
  x86/fpu/amx: Enable the AMX feature in 64-bit mode
  x86/fpu/xstate: Skip writing zeros to signal frame for dynamic user
    states if in INIT-state
  selftest/x86/amx: Test cases for the AMX state management
  x86/insn/amx: Add TILERELEASE instruction to the opcode map
  intel_idle/amx: Add SPR support with XTILEDATA capability
  x86/fpu/xstate: Add a sanity check for XFD state when saving XSTATE
  x86/arch_prctl: ARCH_GET_FEATURES_WITH_KERNEL_ASSISTANCE
  x86/arch_prctl: ARCH_SET_STATE_ENABLE_ALLOC

 arch/x86/include/asm/cpufeatures.h    |    4 +
 arch/x86/include/asm/fpu/internal.h   |   92 ++-
 arch/x86/include/asm/fpu/types.h      |   99 ++-
 arch/x86/include/asm/fpu/xstate.h     |   95 ++-
 arch/x86/include/asm/msr-index.h      |    2 +
 arch/x86/include/asm/processor.h      |   10 +-
 arch/x86/include/asm/proto.h          |    2 +-
 arch/x86/include/asm/special_insns.h  |    6 +
 arch/x86/include/asm/thread_info.h    |    3 +
 arch/x86/include/asm/trace/fpu.h      |    4 +-
 arch/x86/include/uapi/asm/prctl.h     |   23 +-
 arch/x86/kernel/cpu/cpuid-deps.c      |    4 +
 arch/x86/kernel/fpu/core.c            |  113 ++-
 arch/x86/kernel/fpu/init.c            |   37 +-
 arch/x86/kernel/fpu/regset.c          |   55 +-
 arch/x86/kernel/fpu/signal.c          |   98 ++-
 arch/x86/kernel/fpu/xstate.c          |  741 +++++++++++++++--
 arch/x86/kernel/process.c             |   26 +-
 arch/x86/kernel/process_32.c          |    7 +-
 arch/x86/kernel/process_64.c          |   16 +-
 arch/x86/kernel/traps.c               |   60 ++
 arch/x86/kvm/x86.c                    |   48 +-
 arch/x86/lib/x86-opcode-map.txt       |    8 +-
 arch/x86/math-emu/fpu_aux.c           |    2 +-
 arch/x86/math-emu/fpu_entry.c         |    4 +-
 arch/x86/math-emu/fpu_system.h        |    2 +-
 drivers/idle/intel_idle.c             |   82 ++
 kernel/signal.c                       |    8 +
 tools/arch/x86/lib/x86-opcode-map.txt |    8 +-
 tools/testing/selftests/x86/Makefile  |    2 +-
 tools/testing/selftests/x86/amx.c     | 1048 +++++++++++++++++++++++++
 31 files changed, 2456 insertions(+), 253 deletions(-)
 create mode 100644 tools/testing/selftests/x86/amx.c


base-commit: 5816b3e6577eaa676ceb00a848f0fd65fe2adc29
--
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 01/29] x86/fpu/xstate: Fix the state copy function to the XSTATE buffer
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 02/29] x86/fpu/xstate: Modify the initialization helper to handle both static and dynamic buffers Chang S. Bae
                   ` (30 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

Harden copy_uabi_to_xstate() so that it can handle the case where
__raw_xsave() returns NULL. This does not happen in practice today, but
theoretically could happen in the future.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v9:
* Add as a new patch (moved from Patch11). (Borislav Petkov)
---
 arch/x86/kernel/fpu/xstate.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c8def1b7f8fb..fc1d529547e6 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1132,6 +1132,9 @@ static int copy_uabi_to_xstate(struct xregs_state *xsave, const void *kbuf,
 		if (hdr.xfeatures & mask) {
 			void *dst = __raw_xsave_addr(xsave, i);
 
+			if (!dst)
+				continue;
+
 			offset = xstate_offsets[i];
 			size = xstate_sizes[i];
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 02/29] x86/fpu/xstate: Modify the initialization helper to handle both static and dynamic buffers
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 01/29] x86/fpu/xstate: Fix the state copy function to the XSTATE buffer Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 03/29] x86/fpu/xstate: Modify state copy helpers " Chang S. Bae
                   ` (29 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae, kvm

Have the function initializing the XSTATE buffer take a struct fpu *
pointer in preparation for dynamic state buffer support.

init_fpstate is a special case, which is indicated by a null pointer
parameter to fpstate_init().

Also, fpstate_init_xstate() now accepts the state component bitmap to
customize the compacted format.

No functional change.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: kvm@vger.kernel.org
---
Changes from v5:
* Moved fpstate_init_xstate() back to the header (again).
* Massaged the changelog.

Changes from v4:
* Added a proper function description. (Borislav Petkov)
* Added the likely() statement as a null pointer is a special case.

Changes from v3:
* Updated the changelog. (Borislav Petkov)
* Updated the function comment to use kernel-doc style. (Borislav Petkov)

Changes from v2:
* Updated the changelog with task->fpu removed. (Borislav Petkov)
---
 arch/x86/include/asm/fpu/internal.h | 11 ++++++++++-
 arch/x86/kernel/fpu/core.c          | 28 +++++++++++++++++-----------
 arch/x86/kernel/fpu/init.c          |  2 +-
 arch/x86/kernel/fpu/xstate.c        |  3 +--
 arch/x86/kvm/x86.c                  |  2 +-
 5 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 5a18694a89b2..c7a64e2806a9 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -80,7 +80,7 @@ static __always_inline __pure bool use_fxsr(void)
 
 extern union fpregs_state init_fpstate;
 
-extern void fpstate_init(union fpregs_state *state);
+extern void fpstate_init(struct fpu *fpu);
 #ifdef CONFIG_MATH_EMULATION
 extern void fpstate_init_soft(struct swregs_state *soft);
 #else
@@ -88,6 +88,15 @@ static inline void fpstate_init_soft(struct swregs_state *soft) {}
 #endif
 extern void save_fpregs_to_fpstate(struct fpu *fpu);
 
+static inline void fpstate_init_xstate(struct xregs_state *xsave, u64 mask)
+{
+	/*
+	 * XRSTORS requires these bits set in xcomp_bv, or it will
+	 * trigger #GP:
+	 */
+	xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | mask;
+}
+
 /* Returns 0 or the negated trap number, which results in -EFAULT for #PF */
 #define user_insn(insn, output, input...)				\
 ({									\
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 7ada7bd03a32..c0098f8422de 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -203,15 +203,6 @@ void fpu_sync_fpstate(struct fpu *fpu)
 	fpregs_unlock();
 }
 
-static inline void fpstate_init_xstate(struct xregs_state *xsave)
-{
-	/*
-	 * XRSTORS requires these bits set in xcomp_bv, or it will
-	 * trigger #GP:
-	 */
-	xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | xfeatures_mask_all;
-}
-
 static inline void fpstate_init_fxstate(struct fxregs_state *fx)
 {
 	fx->cwd = 0x37f;
@@ -229,8 +220,23 @@ static inline void fpstate_init_fstate(struct fregs_state *fp)
 	fp->fos = 0xffff0000u;
 }
 
-void fpstate_init(union fpregs_state *state)
+/**
+ *
+ * fpstate_init - initialize the xstate buffer
+ *
+ * If @fpu is NULL, initialize init_fpstate.
+ *
+ * @fpu:	A struct fpu * pointer
+ */
+void fpstate_init(struct fpu *fpu)
 {
+	union fpregs_state *state;
+
+	if (likely(fpu))
+		state = &fpu->state;
+	else
+		state = &init_fpstate;
+
 	if (!static_cpu_has(X86_FEATURE_FPU)) {
 		fpstate_init_soft(&state->soft);
 		return;
@@ -239,7 +245,7 @@ void fpstate_init(union fpregs_state *state)
 	memset(state, 0, fpu_kernel_xstate_size);
 
 	if (static_cpu_has(X86_FEATURE_XSAVES))
-		fpstate_init_xstate(&state->xsave);
+		fpstate_init_xstate(&state->xsave, xfeatures_mask_all);
 	if (static_cpu_has(X86_FEATURE_FXSR))
 		fpstate_init_fxstate(&state->fxsave);
 	else
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 64e29927cc32..e14c72bc8706 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -124,7 +124,7 @@ static void __init fpu__init_system_generic(void)
 	 * Set up the legacy init FPU context. (xstate init might overwrite this
 	 * with a more modern format, if the CPU supports it.)
 	 */
-	fpstate_init(&init_fpstate);
+	fpstate_init(NULL);
 
 	fpu__init_system_mxcsr();
 }
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index fc1d529547e6..0fed7fbcf2e8 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -395,8 +395,7 @@ static void __init setup_init_fpu_buf(void)
 	print_xstate_features();
 
 	if (boot_cpu_has(X86_FEATURE_XSAVES))
-		init_fpstate.xsave.header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT |
-						     xfeatures_mask_all;
+		fpstate_init_xstate(&init_fpstate.xsave, xfeatures_mask_all);
 
 	/*
 	 * Init all the features state with header.xfeatures being 0x0
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 28ef14155726..3263567729ed 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10615,7 +10615,7 @@ static void fx_init(struct kvm_vcpu *vcpu)
 	if (!vcpu->arch.guest_fpu)
 		return;
 
-	fpstate_init(&vcpu->arch.guest_fpu->state);
+	fpstate_init(vcpu->arch.guest_fpu);
 	if (boot_cpu_has(X86_FEATURE_XSAVES))
 		vcpu->arch.guest_fpu->state.xsave.header.xcomp_bv =
 			host_xcr0 | XSTATE_COMPACTION_ENABLED;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 03/29] x86/fpu/xstate: Modify state copy helpers to handle both static and dynamic buffers
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 01/29] x86/fpu/xstate: Fix the state copy function to the XSTATE buffer Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 02/29] x86/fpu/xstate: Modify the initialization helper to handle both static and dynamic buffers Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 04/29] x86/fpu/xstate: Modify address finders " Chang S. Bae
                   ` (28 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

Have all the functions copying XSTATE take a struct fpu * pointer in
preparation for dynamic state buffer support.

No functional change.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v5:
* Adjusted function prototype changes to the recent renamed on the new
  base.

Changes from v3:
* Updated the changelog. (Borislav Petkov)

Changes from v2:
* Updated the changelog with task->fpu removed. (Borislav Petkov)
---
 arch/x86/include/asm/fpu/xstate.h |  4 ++--
 arch/x86/kernel/fpu/regset.c      |  2 +-
 arch/x86/kernel/fpu/signal.c      |  2 +-
 arch/x86/kernel/fpu/xstate.c      | 12 ++++++------
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 109dfcc75299..ede166e9d3f2 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -136,8 +136,8 @@ extern void __init update_regset_xstate_info(unsigned int size,
 
 void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr);
 int xfeature_size(int xfeature_nr);
-int copy_uabi_from_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf);
-int copy_sigframe_from_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf);
+int copy_uabi_from_kernel_to_xstate(struct fpu *fpu, const void *kbuf);
+int copy_sigframe_from_user_to_xstate(struct fpu *fpu, const void __user *ubuf);
 
 void xsaves(struct xregs_state *xsave, u64 mask);
 void xrstors(struct xregs_state *xsave, u64 mask);
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 66ed317ebc0d..49dd307003ec 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -164,7 +164,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
 	}
 
 	fpu_force_restore(fpu);
-	ret = copy_uabi_from_kernel_to_xstate(&fpu->state.xsave, kbuf ?: tmpbuf);
+	ret = copy_uabi_from_kernel_to_xstate(fpu, kbuf ?: tmpbuf);
 
 out:
 	vfree(tmpbuf);
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 445c57c9c539..bec8c8046888 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -371,7 +371,7 @@ static int __fpu_restore_sig(void __user *buf, void __user *buf_fx,
 	fpregs_unlock();
 
 	if (use_xsave() && !fx_only) {
-		ret = copy_sigframe_from_user_to_xstate(&fpu->state.xsave, buf_fx);
+		ret = copy_sigframe_from_user_to_xstate(fpu, buf_fx);
 		if (ret)
 			return ret;
 	} else {
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 0fed7fbcf2e8..df39085e9d05 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1089,10 +1089,10 @@ static int copy_from_buffer(void *dst, unsigned int offset, unsigned int size,
 	return 0;
 }
 
-
-static int copy_uabi_to_xstate(struct xregs_state *xsave, const void *kbuf,
+static int copy_uabi_to_xstate(struct fpu *fpu, const void *kbuf,
 			       const void __user *ubuf)
 {
+	struct xregs_state *xsave = &fpu->state.xsave;
 	unsigned int offset, size;
 	struct xstate_header hdr;
 	u64 mask;
@@ -1161,9 +1161,9 @@ static int copy_uabi_to_xstate(struct xregs_state *xsave, const void *kbuf,
  * format and copy to the target thread. This is called from
  * xstateregs_set().
  */
-int copy_uabi_from_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf)
+int copy_uabi_from_kernel_to_xstate(struct fpu *fpu, const void *kbuf)
 {
-	return copy_uabi_to_xstate(xsave, kbuf, NULL);
+	return copy_uabi_to_xstate(fpu, kbuf, NULL);
 }
 
 /*
@@ -1171,10 +1171,10 @@ int copy_uabi_from_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf)
  * XSAVE[S] format and copy to the target thread. This is called from the
  * sigreturn() and rt_sigreturn() system calls.
  */
-int copy_sigframe_from_user_to_xstate(struct xregs_state *xsave,
+int copy_sigframe_from_user_to_xstate(struct fpu *fpu,
 				      const void __user *ubuf)
 {
-	return copy_uabi_to_xstate(xsave, NULL, ubuf);
+	return copy_uabi_to_xstate(fpu, NULL, ubuf);
 }
 
 static bool validate_xsaves_xrstors(u64 mask)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 04/29] x86/fpu/xstate: Modify address finders to handle both static and dynamic buffers
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (2 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 03/29] x86/fpu/xstate: Modify state copy helpers " Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 05/29] x86/fpu/xstate: Add a new variable to indicate dynamic user states Chang S. Bae
                   ` (27 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae, kvm

Have all the functions finding XSTATE address take a struct fpu * pointer
in preparation for dynamic state buffer support.

init_fpstate is a special case, which is indicated by a null pointer
parameter to get_xsave_addr() and __raw_xsave_addr().

No functional change.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: kvm@vger.kernel.org
---
Changes from v5:
* Adjusted some call sites for the new base.

Changes from v3:
* Updated the changelog. (Borislav Petkov)
* Updated the function comment to use kernel-doc style. (Borislav Petkov)

Changes from v2:
* Updated the changelog with task->fpu removed. (Borislav Petkov)

Changes from v1:
* Rebased on the upstream kernel (5.10)
---
 arch/x86/include/asm/fpu/xstate.h |  2 +-
 arch/x86/kernel/fpu/xstate.c      | 42 ++++++++++++++++++++++++-------
 arch/x86/kvm/x86.c                | 10 +++-----
 3 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index ede166e9d3f2..2451bccc6cac 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -134,7 +134,7 @@ extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];
 extern void __init update_regset_xstate_info(unsigned int size,
 					     u64 xstate_mask);
 
-void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr);
+void *get_xsave_addr(struct fpu *fpu, int xfeature_nr);
 int xfeature_size(int xfeature_nr);
 int copy_uabi_from_kernel_to_xstate(struct fpu *fpu, const void *kbuf);
 int copy_sigframe_from_user_to_xstate(struct fpu *fpu, const void __user *ubuf);
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index df39085e9d05..0a59df0c48e7 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -841,19 +841,34 @@ void fpu__resume_cpu(void)
 	}
 }
 
-/*
+/**
+ * __raw_xsave_addr - Find the address where the feature state is saved.
+ *
  * Given an xstate feature nr, calculate where in the xsave
  * buffer the state is.  Callers should ensure that the buffer
  * is valid.
+ *
+ * If @fpu is NULL, use init_fpstate.
+ *
+ * @fpu:	A struct fpu * pointer
+ *
+ * Return:	An address of the feature state in the buffer
  */
-static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
+static void *__raw_xsave_addr(struct fpu *fpu, int xfeature_nr)
 {
+	void *xsave;
+
 	if (!xfeature_enabled(xfeature_nr)) {
 		WARN_ON_FPU(1);
 		return NULL;
 	}
 
-	return (void *)xsave + xstate_comp_offsets[xfeature_nr];
+	if (fpu)
+		xsave = &fpu->state.xsave;
+	else
+		xsave = &init_fpstate.xsave;
+
+	return xsave + xstate_comp_offsets[xfeature_nr];
 }
 /*
  * Given the xsave area and a state inside, this function returns the
@@ -866,15 +881,18 @@ static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
  * this will return NULL.
  *
  * Inputs:
- *	xstate: the thread's storage area for all FPU data
+ *	fpu: the thread's FPU data to reference xstate buffer(s).
+ *	     (A null pointer parameter indicates init_fpstate.)
  *	xfeature_nr: state which is defined in xsave.h (e.g. XFEATURE_FP,
  *	XFEATURE_SSE, etc...)
  * Output:
  *	address of the state in the xsave area, or NULL if the
  *	field is not present in the xsave buffer.
  */
-void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
+void *get_xsave_addr(struct fpu *fpu, int xfeature_nr)
 {
+	struct xregs_state *xsave;
+
 	/*
 	 * Do we even *have* xsave state?
 	 */
@@ -887,6 +905,12 @@ void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
 	 */
 	WARN_ONCE(!(xfeatures_mask_all & BIT_ULL(xfeature_nr)),
 		  "get of unsupported state");
+
+	if (fpu)
+		xsave = &fpu->state.xsave;
+	else
+		xsave = &init_fpstate.xsave;
+
 	/*
 	 * This assumes the last 'xsave*' instruction to
 	 * have requested that 'xfeature_nr' be saved.
@@ -901,7 +925,7 @@ void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
 	if (!(xsave->header.xfeatures & BIT_ULL(xfeature_nr)))
 		return NULL;
 
-	return __raw_xsave_addr(xsave, xfeature_nr);
+	return __raw_xsave_addr(fpu, xfeature_nr);
 }
 EXPORT_SYMBOL_GPL(get_xsave_addr);
 
@@ -1061,8 +1085,8 @@ void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
 			membuf_write(&to, &pkru, sizeof(pkru));
 		} else {
 			copy_feature(header.xfeatures & BIT_ULL(i), &to,
-				     __raw_xsave_addr(xsave, i),
-				     __raw_xsave_addr(xinit, i),
+				     __raw_xsave_addr(&tsk->thread.fpu, i),
+				     __raw_xsave_addr(NULL, i),
 				     xstate_sizes[i]);
 		}
 		/*
@@ -1129,7 +1153,7 @@ static int copy_uabi_to_xstate(struct fpu *fpu, const void *kbuf,
 		u64 mask = ((u64)1 << i);
 
 		if (hdr.xfeatures & mask) {
-			void *dst = __raw_xsave_addr(xsave, i);
+			void *dst = __raw_xsave_addr(fpu, i);
 
 			if (!dst)
 				continue;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3263567729ed..049ffc8305f9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4729,7 +4729,7 @@ static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
 			memcpy(dest + offset, &vcpu->arch.pkru,
 			       sizeof(vcpu->arch.pkru));
 		} else {
-			src = get_xsave_addr(xsave, xfeature_nr);
+			src = get_xsave_addr(vcpu->arch.guest_fpu, xfeature_nr);
 			if (src)
 				memcpy(dest + offset, src, size);
 		}
@@ -4772,7 +4772,7 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
 			memcpy(&vcpu->arch.pkru, src + offset,
 			       sizeof(vcpu->arch.pkru));
 		} else {
-			void *dest = get_xsave_addr(xsave, xfeature_nr);
+			void *dest = get_xsave_addr(vcpu->arch.guest_fpu, xfeature_nr);
 
 			if (dest)
 				memcpy(dest, src + offset, size);
@@ -10849,12 +10849,10 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 		 */
 		if (init_event)
 			kvm_put_guest_fpu(vcpu);
-		mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu->state.xsave,
-					XFEATURE_BNDREGS);
+		mpx_state_buffer = get_xsave_addr(vcpu->arch.guest_fpu, XFEATURE_BNDREGS);
 		if (mpx_state_buffer)
 			memset(mpx_state_buffer, 0, sizeof(struct mpx_bndreg_state));
-		mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu->state.xsave,
-					XFEATURE_BNDCSR);
+		mpx_state_buffer = get_xsave_addr(vcpu->arch.guest_fpu, XFEATURE_BNDCSR);
 		if (mpx_state_buffer)
 			memset(mpx_state_buffer, 0, sizeof(struct mpx_bndcsr));
 		if (init_event)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 05/29] x86/fpu/xstate: Add a new variable to indicate dynamic user states
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (3 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 04/29] x86/fpu/xstate: Modify address finders " Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 06/29] x86/fpu/xstate: Add new variables to indicate dynamic XSTATE buffer size Chang S. Bae
                   ` (26 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

The XSTATE per-task buffer is in preparation to be dynamic for user states.
Introduce a new mask variable to indicate the 'dynamic' user states. The
value is determined at boot-time.

No functional change.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v5:
* Made the variable __ro_after_init.
* Dropped the perf's xstate buffer renaming, as renamed already.

Changes from v3:
* Updated the changelog. (Borislav Petkov)
* Updated the code comment. (Borislav Petkov)

Changes from v2:
* Updated the changelog for clarification.
---
 arch/x86/include/asm/fpu/xstate.h | 2 ++
 arch/x86/kernel/fpu/xstate.c      | 9 +++++++++
 2 files changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 2451bccc6cac..bc4cba62906b 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -129,6 +129,8 @@ static inline u64 xfeatures_mask_independent(void)
 	return XFEATURE_MASK_INDEPENDENT;
 }
 
+extern u64 xfeatures_mask_user_dynamic;
+
 extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];
 
 extern void __init update_regset_xstate_info(unsigned int size,
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 0a59df0c48e7..6a658ef913bd 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -62,6 +62,12 @@ static short xsave_cpuid_features[] __initdata = {
 u64 xfeatures_mask_all __ro_after_init;
 EXPORT_SYMBOL_GPL(xfeatures_mask_all);
 
+/*
+ * This represents user xstates, a subset of xfeatures_mask_all, saved in a
+ * dynamic kernel XSAVE buffer.
+ */
+u64 xfeatures_mask_user_dynamic __ro_after_init;
+
 static unsigned int xstate_offsets[XFEATURE_MAX] __ro_after_init =
 	{ [ 0 ... XFEATURE_MAX - 1] = -1};
 static unsigned int xstate_sizes[XFEATURE_MAX] __ro_after_init =
@@ -709,6 +715,7 @@ static int __init init_xstate_size(void)
 static void fpu__init_disable_system_xstate(void)
 {
 	xfeatures_mask_all = 0;
+	xfeatures_mask_user_dynamic = 0;
 	cr4_clear_bits(X86_CR4_OSXSAVE);
 	setup_clear_cpu_cap(X86_FEATURE_XSAVE);
 }
@@ -780,6 +787,8 @@ void __init fpu__init_system_xstate(void)
 
 	/* Store it for paranoia check at the end */
 	xfeatures = xfeatures_mask_all;
+	/* Do not support the dynamically allocated buffer yet. */
+	xfeatures_mask_user_dynamic = 0;
 
 	/* Enable xstate instructions to be able to continue with initialization: */
 	fpu__init_cpu_xstate();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 06/29] x86/fpu/xstate: Add new variables to indicate dynamic XSTATE buffer size
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (4 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 05/29] x86/fpu/xstate: Add a new variable to indicate dynamic user states Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 07/29] x86/fpu/xstate: Calculate and remember dynamic XSTATE buffer sizes Chang S. Bae
                   ` (25 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae, kvm

The XSTATE per-task buffer is in preparation to be dynamic for user states.
Introduce new size variables to indicate the minimum and maximum size of
the buffer. The value is determined at boot-time.

Instead of adding them as newly exported, introduce helper functions to
access them as well as the user buffer size.

No functional change. Those sizes have no difference, as the buffer is not
dynamic yet.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: kvm@vger.kernel.org
---
Changes from v9:
* Remove access helpers. (Borislav Petkov)

Changes from v6:
* Massage the code comment.

Changes from v5:
* Made the new variables __ro_after_init for the new base code.
* Fixed the init_fpstate size for memset().

Changes from v3:
* Added as a new patch to add the variables along with new helpers.
  (Borislav Petkov)
---
 arch/x86/include/asm/fpu/xstate.h | 17 +++++++++++++
 arch/x86/include/asm/processor.h  | 10 +-------
 arch/x86/kernel/fpu/core.c        | 26 +++++++++++++------
 arch/x86/kernel/fpu/init.c        | 26 ++++++++-----------
 arch/x86/kernel/fpu/regset.c      |  2 +-
 arch/x86/kernel/fpu/signal.c      | 25 ++++++++++--------
 arch/x86/kernel/fpu/xstate.c      | 42 ++++++++++++++++++-------------
 arch/x86/kernel/process.c         |  7 ++++++
 arch/x86/kvm/x86.c                |  5 +++-
 9 files changed, 98 insertions(+), 62 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index bc4cba62906b..c4a0914b7717 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -136,6 +136,23 @@ extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];
 extern void __init update_regset_xstate_info(unsigned int size,
 					     u64 xstate_mask);
 
+/**
+ * struct fpu_xstate_buffer_config - xstate buffer configuration
+ * @max_size:			The CPUID-enumerated all-feature "maximum" size
+ *				for xstate per-task buffer.
+ * @min_size:			The size to fit into the statically-allocated
+ *				buffer. With dynamic states, this buffer no longer
+ *				contains all the enabled state components.
+ * @user_size:			The size of user-space buffer for signal and
+ *				ptrace frames, in the non-compacted format.
+ */
+struct fpu_xstate_buffer_config {
+	unsigned int min_size, max_size;
+	unsigned int user_size;
+};
+
+extern struct fpu_xstate_buffer_config fpu_buf_cfg;
+
 void *get_xsave_addr(struct fpu *fpu, int xfeature_nr);
 int xfeature_size(int xfeature_nr);
 int copy_uabi_from_kernel_to_xstate(struct fpu *fpu, const void *kbuf);
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 9ad2acaaae9b..4a9abbb70987 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -461,9 +461,6 @@ DECLARE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
 DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
 #endif	/* !X86_64 */
 
-extern unsigned int fpu_kernel_xstate_size;
-extern unsigned int fpu_user_xstate_size;
-
 struct perf_event;
 
 struct thread_struct {
@@ -538,12 +535,7 @@ struct thread_struct {
 };
 
 /* Whitelist the FPU state from the task_struct for hardened usercopy. */
-static inline void arch_thread_struct_whitelist(unsigned long *offset,
-						unsigned long *size)
-{
-	*offset = offsetof(struct thread_struct, fpu.state);
-	*size = fpu_kernel_xstate_size;
-}
+extern void arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size);
 
 static inline void
 native_load_sp0(unsigned long sp0)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index c0098f8422de..62cc993a890a 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -231,21 +231,30 @@ static inline void fpstate_init_fstate(struct fregs_state *fp)
 void fpstate_init(struct fpu *fpu)
 {
 	union fpregs_state *state;
+	unsigned int size;
+	u64 mask;
 
-	if (likely(fpu))
+	if (likely(fpu)) {
 		state = &fpu->state;
-	else
+		/* The dynamic user states are not prepared yet. */
+		mask = xfeatures_mask_all & ~xfeatures_mask_user_dynamic;
+		size = fpu_buf_cfg.min_size;
+	} else {
 		state = &init_fpstate;
+		mask = xfeatures_mask_all;
+		size = sizeof(init_fpstate);
+	}
 
 	if (!static_cpu_has(X86_FEATURE_FPU)) {
 		fpstate_init_soft(&state->soft);
 		return;
 	}
 
-	memset(state, 0, fpu_kernel_xstate_size);
+	memset(state, 0, size);
 
 	if (static_cpu_has(X86_FEATURE_XSAVES))
-		fpstate_init_xstate(&state->xsave, xfeatures_mask_all);
+		fpstate_init_xstate(&state->xsave, mask);
+
 	if (static_cpu_has(X86_FEATURE_FXSR))
 		fpstate_init_fxstate(&state->fxsave);
 	else
@@ -268,8 +277,11 @@ int fpu_clone(struct task_struct *dst)
 	/*
 	 * Don't let 'init optimized' areas of the XSAVE area
 	 * leak into the child task:
+	 *
+	 * The child does not inherit the dynamic states. So,
+	 * the xstate buffer has the minimum size.
 	 */
-	memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_size);
+	memset(&dst_fpu->state.xsave, 0, fpu_buf_cfg.min_size);
 
 	/*
 	 * If the FPU registers are not owned by current just memcpy() the
@@ -278,7 +290,7 @@ int fpu_clone(struct task_struct *dst)
 	 */
 	fpregs_lock();
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
-		memcpy(&dst_fpu->state, &src_fpu->state, fpu_kernel_xstate_size);
+		memcpy(&dst_fpu->state, &src_fpu->state, fpu_buf_cfg.min_size);
 
 	else
 		save_fpregs_to_fpstate(dst_fpu);
@@ -337,7 +349,7 @@ static inline void restore_fpregs_from_init_fpstate(u64 features_mask)
 static inline unsigned int init_fpstate_copy_size(void)
 {
 	if (!use_xsave())
-		return fpu_kernel_xstate_size;
+		return fpu_buf_cfg.min_size;
 
 	/* XSAVE(S) just needs the legacy and the xstate header part */
 	return sizeof(init_fpstate.xsave);
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index e14c72bc8706..da7341f95008 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -129,15 +129,6 @@ static void __init fpu__init_system_generic(void)
 	fpu__init_system_mxcsr();
 }
 
-/*
- * Size of the FPU context state. All tasks in the system use the
- * same context size, regardless of what portion they use.
- * This is inherent to the XSAVE architecture which puts all state
- * components into a single, continuous memory block:
- */
-unsigned int fpu_kernel_xstate_size __ro_after_init;
-EXPORT_SYMBOL_GPL(fpu_kernel_xstate_size);
-
 /* Get alignment of the TYPE. */
 #define TYPE_ALIGN(TYPE) offsetof(struct { char x; TYPE test; }, test)
 
@@ -167,8 +158,10 @@ static void __init fpu__init_task_struct_size(void)
 	/*
 	 * Add back the dynamically-calculated register state
 	 * size.
+	 *
+	 * Use the minimum size as embedded in task_struct.
 	 */
-	task_size += fpu_kernel_xstate_size;
+	task_size += fpu_buf_cfg.min_size;
 
 	/*
 	 * We dynamically size 'struct fpu', so we require that
@@ -193,6 +186,7 @@ static void __init fpu__init_task_struct_size(void)
 static void __init fpu__init_system_xstate_size_legacy(void)
 {
 	static int on_boot_cpu __initdata = 1;
+	unsigned int xstate_size;
 
 	WARN_ON_FPU(!on_boot_cpu);
 	on_boot_cpu = 0;
@@ -203,17 +197,17 @@ static void __init fpu__init_system_xstate_size_legacy(void)
 	 */
 
 	if (!boot_cpu_has(X86_FEATURE_FPU)) {
-		fpu_kernel_xstate_size = sizeof(struct swregs_state);
+		xstate_size = sizeof(struct swregs_state);
 	} else {
 		if (boot_cpu_has(X86_FEATURE_FXSR))
-			fpu_kernel_xstate_size =
-				sizeof(struct fxregs_state);
+			xstate_size = sizeof(struct fxregs_state);
 		else
-			fpu_kernel_xstate_size =
-				sizeof(struct fregs_state);
+			xstate_size = sizeof(struct fregs_state);
 	}
 
-	fpu_user_xstate_size = fpu_kernel_xstate_size;
+	fpu_buf_cfg.min_size = xstate_size;
+	fpu_buf_cfg.max_size = xstate_size;
+	fpu_buf_cfg.user_size = xstate_size;
 }
 
 /* Legacy code to initialize eager fpu mode. */
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 49dd307003ec..80ee64183c7d 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -149,7 +149,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
 	/*
 	 * A whole standard-format XSAVE buffer is needed:
 	 */
-	if (pos != 0 || count != fpu_user_xstate_size)
+	if (pos != 0 || count != fpu_buf_cfg.user_size)
 		return -EFAULT;
 
 	if (!kbuf) {
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index bec8c8046888..f5ec334c5a4e 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -36,7 +36,7 @@ static inline int check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
 	/* Check for the first magic field and other error scenarios. */
 	if (fx_sw->magic1 != FP_XSTATE_MAGIC1 ||
 	    fx_sw->xstate_size < min_xstate_size ||
-	    fx_sw->xstate_size > fpu_user_xstate_size ||
+	    fx_sw->xstate_size > fpu_buf_cfg.user_size ||
 	    fx_sw->xstate_size > fx_sw->extended_size)
 		goto setfx;
 
@@ -107,7 +107,7 @@ static inline int save_xstate_epilog(void __user *buf, int ia32_frame)
 		return err;
 
 	err |= __put_user(FP_XSTATE_MAGIC2,
-			  (__u32 __user *)(buf + fpu_user_xstate_size));
+			  (__u32 __user *)(buf + fpu_buf_cfg.user_size));
 
 	/*
 	 * Read the xfeatures which we copied (directly from the cpu or
@@ -144,7 +144,7 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
 	else
 		err = fnsave_to_user_sigframe((struct fregs_state __user *) buf);
 
-	if (unlikely(err) && __clear_user(buf, fpu_user_xstate_size))
+	if (unlikely(err) && __clear_user(buf, fpu_buf_cfg.user_size))
 		err = -EFAULT;
 	return err;
 }
@@ -205,7 +205,7 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 	fpregs_unlock();
 
 	if (ret) {
-		if (!fault_in_pages_writeable(buf_fx, fpu_user_xstate_size))
+		if (!fault_in_pages_writeable(buf_fx, fpu_buf_cfg.user_size))
 			goto retry;
 		return -EFAULT;
 	}
@@ -304,12 +304,12 @@ static int restore_fpregs_from_user(void __user *buf, u64 xrestore,
 static int __fpu_restore_sig(void __user *buf, void __user *buf_fx,
 			     bool ia32_fxstate)
 {
-	int state_size = fpu_kernel_xstate_size;
 	struct task_struct *tsk = current;
 	struct fpu *fpu = &tsk->thread.fpu;
 	struct user_i387_ia32_struct env;
 	u64 user_xfeatures = 0;
 	bool fx_only = false;
+	int state_size;
 	int ret;
 
 	if (use_xsave()) {
@@ -323,6 +323,8 @@ static int __fpu_restore_sig(void __user *buf, void __user *buf_fx,
 		state_size = fx_sw_user.xstate_size;
 		user_xfeatures = fx_sw_user.xfeatures;
 	} else {
+		/* The buffer cannot be dynamic without using XSAVE. */
+		state_size = fpu_buf_cfg.min_size;
 		user_xfeatures = XFEATURE_MASK_FPSSE;
 	}
 
@@ -418,8 +420,8 @@ static int __fpu_restore_sig(void __user *buf, void __user *buf_fx,
 }
 static inline int xstate_sigframe_size(void)
 {
-	return use_xsave() ? fpu_user_xstate_size + FP_XSTATE_MAGIC2_SIZE :
-			fpu_user_xstate_size;
+	return use_xsave() ? fpu_buf_cfg.user_size + FP_XSTATE_MAGIC2_SIZE :
+			fpu_buf_cfg.user_size;
 }
 
 /*
@@ -514,19 +516,20 @@ unsigned long fpu__get_fpstate_size(void)
  */
 void fpu__init_prepare_fx_sw_frame(void)
 {
-	int size = fpu_user_xstate_size + FP_XSTATE_MAGIC2_SIZE;
+	int ext_size = fpu_buf_cfg.user_size + FP_XSTATE_MAGIC2_SIZE;
+	int xstate_size = fpu_buf_cfg.user_size;
 
 	fx_sw_reserved.magic1 = FP_XSTATE_MAGIC1;
-	fx_sw_reserved.extended_size = size;
+	fx_sw_reserved.extended_size = ext_size;
 	fx_sw_reserved.xfeatures = xfeatures_mask_uabi();
-	fx_sw_reserved.xstate_size = fpu_user_xstate_size;
+	fx_sw_reserved.xstate_size = xstate_size;
 
 	if (IS_ENABLED(CONFIG_IA32_EMULATION) ||
 	    IS_ENABLED(CONFIG_X86_32)) {
 		int fsave_header_size = sizeof(struct fregs_state);
 
 		fx_sw_reserved_ia32 = fx_sw_reserved;
-		fx_sw_reserved_ia32.extended_size = size + fsave_header_size;
+		fx_sw_reserved_ia32.extended_size = ext_size + fsave_header_size;
 	}
 }
 
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 6a658ef913bd..058dc9df6b86 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -77,12 +77,8 @@ static unsigned int xstate_comp_offsets[XFEATURE_MAX] __ro_after_init =
 static unsigned int xstate_supervisor_only_offsets[XFEATURE_MAX] __ro_after_init =
 	{ [ 0 ... XFEATURE_MAX - 1] = -1};
 
-/*
- * The XSAVE area of kernel can be in standard or compacted format;
- * it is always in standard format for user mode. This is the user
- * mode standard format size used for signal and ptrace frames.
- */
-unsigned int fpu_user_xstate_size __ro_after_init;
+struct fpu_xstate_buffer_config fpu_buf_cfg __ro_after_init;
+EXPORT_SYMBOL_GPL(fpu_buf_cfg);
 
 /*
  * Return whether the system supports a given xfeature.
@@ -595,7 +591,11 @@ static void do_extra_xstate_size_checks(void)
 		 */
 		paranoid_xstate_size += xfeature_size(i);
 	}
-	XSTATE_WARN_ON(paranoid_xstate_size != fpu_kernel_xstate_size);
+	/*
+	 * The size accounts for all the possible states reserved in the
+	 * per-task buffer.  Check against the maximum size.
+	 */
+	XSTATE_WARN_ON(paranoid_xstate_size != fpu_buf_cfg.max_size);
 }
 
 
@@ -690,21 +690,29 @@ static int __init init_xstate_size(void)
 	else
 		possible_xstate_size = xsave_size;
 
-	/* Ensure we have the space to store all enabled: */
-	if (!is_supported_xstate_size(possible_xstate_size))
-		return -EINVAL;
-
 	/*
-	 * The size is OK, we are definitely going to use xsave,
-	 * make it known to the world that we need more space.
+	 * The size accounts for all the possible states reserved in the
+	 * per-task buffer.  Set the maximum with this value.
 	 */
-	fpu_kernel_xstate_size = possible_xstate_size;
+	fpu_buf_cfg.max_size = possible_xstate_size;
+
+	/* Perform an extra check for the maximum size. */
 	do_extra_xstate_size_checks();
 
+	/*
+	 * Set the minimum to be the same as the maximum. The dynamic
+	 * user states are not supported yet.
+	 */
+	fpu_buf_cfg.min_size = possible_xstate_size;
+
+	/* Ensure the minimum size fits in the statically-allocated buffer: */
+	if (!is_supported_xstate_size(fpu_buf_cfg.min_size))
+		return -EINVAL;
+
 	/*
 	 * User space is always in standard format.
 	 */
-	fpu_user_xstate_size = xsave_size;
+	fpu_buf_cfg.user_size = xsave_size;
 	return 0;
 }
 
@@ -800,7 +808,7 @@ void __init fpu__init_system_xstate(void)
 	 * Update info used for ptrace frames; use standard-format size and no
 	 * supervisor xstates:
 	 */
-	update_regset_xstate_info(fpu_user_xstate_size, xfeatures_mask_uabi());
+	update_regset_xstate_info(fpu_buf_cfg.user_size, xfeatures_mask_uabi());
 
 	fpu__init_prepare_fx_sw_frame();
 	setup_init_fpu_buf();
@@ -820,7 +828,7 @@ void __init fpu__init_system_xstate(void)
 	print_xstate_offset_size();
 	pr_info("x86/fpu: Enabled xstate features 0x%llx, context size is %d bytes, using '%s' format.\n",
 		xfeatures_mask_all,
-		fpu_kernel_xstate_size,
+		fpu_buf_cfg.max_size,
 		boot_cpu_has(X86_FEATURE_XSAVES) ? "compacted" : "standard");
 	return;
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 1d9463e3096b..d1ca963cb8f7 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -90,6 +90,13 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	return fpu_clone(dst);
 }
 
+void arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
+{
+	*offset = offsetof(struct thread_struct, fpu.state);
+	/* The buffer embedded in thread_struct has the minimum size. */
+	*size = fpu_buf_cfg.min_size;
+}
+
 /*
  * Free thread data structures etc..
  */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 049ffc8305f9..948c207a8752 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9904,10 +9904,13 @@ static void kvm_save_current_fpu(struct fpu *fpu)
 	/*
 	 * If the target FPU state is not resident in the CPU registers, just
 	 * memcpy() from current, else save CPU state directly to the target.
+	 *
+	 * KVM does not support dynamic user states yet. Assume the buffer
+	 * always has the minimum size.
 	 */
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
 		memcpy(&fpu->state, &current->thread.fpu.state,
-		       fpu_kernel_xstate_size);
+		       fpu_buf_cfg.min_size);
 	else
 		save_fpregs_to_fpstate(fpu);
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 07/29] x86/fpu/xstate: Calculate and remember dynamic XSTATE buffer sizes
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (5 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 06/29] x86/fpu/xstate: Add new variables to indicate dynamic XSTATE buffer size Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 08/29] x86/fpu/xstate: Convert the struct fpu 'state' field to a pointer Chang S. Bae
                   ` (24 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

The CPUID instruction separately enumerates sizes and alignments of
individual xfeatures. It independently enumerates the required size of an
entire XSAVE buffer to store all enabled features.

calculate_xstate_sizes() currently uses the individual feature
size/alignment enumeration to independently recalculate the required XSAVE
buffer size.

The XSTATE per-task buffer is currently embedded into struct fpu with
static size. To accommodate dynamic user XSTATEs, record the maximum and
minimum buffer sizes.

Extend the function to accept an option to exclude dynamic states. With
that, calculate the maximum size that contains all the enabled states, and
the minimum size that fits in the embedded buffer by excluding them.

Also, move the size comparison with the CPUID value out to the call site.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v9:
* Update the changelog and the code comment. (Borislav Petkov)

Changes from v6:
* Simplify xstate size calculation code. (Dave Hansen)
* Updated the changelog. (Dave Hansen)
* Fixed the v6 changes.

Changes from v5:
* Re-adjusted some local variable names.

Changes from v4:
* Massaged the function description, in preparation for the change
  with a return value.

Changes from v3:
* Updated the changelog. (Borislav Petkov)
* Updated the code comment. (Borislav Petkov)
* Adjusted the calculation function naming.
* Moved out the new variable addition into a new patch.

Changes from v2:
* Updated the changelog with task->fpu removed. (Borislav Petkov)
* Renamed the in-line size variable.
* Updated some code comments.
---
 arch/x86/kernel/fpu/xstate.c | 61 ++++++++++++++++++------------------
 1 file changed, 31 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 058dc9df6b86..2e474fbdc241 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -548,24 +548,31 @@ static void check_xstate_against_struct(int nr)
 	}
 }
 
-/*
- * This essentially double-checks what the cpu told us about
- * how large the XSAVE buffer needs to be.  We are recalculating
- * it to be safe.
+/**
+ * calculate_xstate_size - Calculate the xstate per-task buffer size.
+ *
+ * This essentially double-checks what the CPU told us about how large the
+ * XSAVE buffer needs to be. We are recalculating it to be safe.
+ *
+ * Independent XSAVE features allocate their own buffers and are always
+ * excluded. Only the size of the buffer for task->fpu is checked here.
  *
- * Independent XSAVE features allocate their own buffers and are not
- * covered by these checks. Only the size of the buffer for task->fpu
- * is checked here.
+ * @include_dynamic_states:	A knob to include dynamic states or not.
+ *
+ * Return:			The calculated xstate size.
  */
-static void do_extra_xstate_size_checks(void)
+static unsigned int calculate_xstate_size(bool include_dynamic_states)
 {
-	int paranoid_xstate_size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
+	unsigned int xstate_size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
 	int i;
 
 	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
 		if (!xfeature_enabled(i))
 			continue;
 
+		if ((xfeatures_mask_user_dynamic & BIT_ULL(i)) && !include_dynamic_states)
+			continue;
+
 		check_xstate_against_struct(i);
 		/*
 		 * Supervisor state components can be managed only by
@@ -576,7 +583,7 @@ static void do_extra_xstate_size_checks(void)
 
 		/* Align from the end of the previous feature */
 		if (xfeature_is_aligned(i))
-			paranoid_xstate_size = ALIGN(paranoid_xstate_size, 64);
+			xstate_size = ALIGN(xstate_size, 64);
 		/*
 		 * The offset of a given state in the non-compacted
 		 * format is given to us in a CPUID leaf.  We check
@@ -584,20 +591,16 @@ static void do_extra_xstate_size_checks(void)
 		 * setup_xstate_features(). XSAVES uses compacted format.
 		 */
 		if (!cpu_feature_enabled(X86_FEATURE_XSAVES))
-			paranoid_xstate_size = xfeature_uncompacted_offset(i);
+			xstate_size = xfeature_uncompacted_offset(i);
 		/*
 		 * The compacted-format offset always depends on where
 		 * the previous state ended.
 		 */
-		paranoid_xstate_size += xfeature_size(i);
+		xstate_size += xfeature_size(i);
 	}
-	/*
-	 * The size accounts for all the possible states reserved in the
-	 * per-task buffer.  Check against the maximum size.
-	 */
-	XSTATE_WARN_ON(paranoid_xstate_size != fpu_buf_cfg.max_size);
-}
 
+	return xstate_size;
+}
 
 /*
  * Get total size of enabled xstates in XCR0 | IA32_XSS.
@@ -680,7 +683,7 @@ static bool is_supported_xstate_size(unsigned int test_xstate_size)
 static int __init init_xstate_size(void)
 {
 	/* Recompute the context size for enabled features: */
-	unsigned int possible_xstate_size;
+	unsigned int possible_xstate_size, xstate_size;
 	unsigned int xsave_size;
 
 	xsave_size = get_xsave_size();
@@ -691,24 +694,22 @@ static int __init init_xstate_size(void)
 		possible_xstate_size = xsave_size;
 
 	/*
-	 * The size accounts for all the possible states reserved in the
-	 * per-task buffer.  Set the maximum with this value.
+	 * Calculate the maximum xstate size, including the dynamic states.
 	 */
 	fpu_buf_cfg.max_size = possible_xstate_size;
-
-	/* Perform an extra check for the maximum size. */
-	do_extra_xstate_size_checks();
+	xstate_size = calculate_xstate_size(true);
+	XSTATE_WARN_ON(possible_xstate_size != xstate_size);
 
 	/*
-	 * Set the minimum to be the same as the maximum. The dynamic
-	 * user states are not supported yet.
+	 * Calculate the minimum xstate size, i.e., excluding the dynamic
+	 * xstates.
 	 */
-	fpu_buf_cfg.min_size = possible_xstate_size;
-
-	/* Ensure the minimum size fits in the statically-allocated buffer: */
-	if (!is_supported_xstate_size(fpu_buf_cfg.min_size))
+	xstate_size = calculate_xstate_size(false);
+	if (!is_supported_xstate_size(xstate_size))
 		return -EINVAL;
 
+	fpu_buf_cfg.min_size = xstate_size;
+
 	/*
 	 * User space is always in standard format.
 	 */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 08/29] x86/fpu/xstate: Convert the struct fpu 'state' field to a pointer
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (6 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 07/29] x86/fpu/xstate: Calculate and remember dynamic XSTATE buffer sizes Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 09/29] x86/fpu/xstate: Introduce helpers to manage the XSTATE buffer dynamically Chang S. Bae
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae, kvm

The XSTATE per-task buffer is embedded into struct fpu. The field 'state'
represents the buffer. When the dynamic user state is in use, the buffer
may be dynamically allocated.

Convert the 'state' field to point either to the embedded buffer or to the
dynamically-allocated buffer. Also, add a new field to represent the
embedded buffer.

The initial task sets it before dealing with soft FPU. Make sure that every
FPU state has a valid pointer value on its creation.

No functional change.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: kvm@vger.kernel.org
---
Changes from v9:
* Update the code comment. (Borislav Petkov)

Changes from v5:
* Tightened up task size calculation (previously, it could over-calculate)
* Adjusted the changelog.

Changes from v4:
* Fixed KVM's user_fpu and guest_fpu to initialize the 'state' field correctly.
* Massaged the changelog.

Changes from v3:
* Added as a new patch to simplify the buffer access. (Borislav Petkov)
---
 arch/x86/include/asm/fpu/internal.h |  2 +-
 arch/x86/include/asm/fpu/types.h    | 31 +++++++++++++++++++++-------
 arch/x86/include/asm/trace/fpu.h    |  4 ++--
 arch/x86/kernel/fpu/core.c          | 32 +++++++++++++++--------------
 arch/x86/kernel/fpu/init.c          |  8 +++++---
 arch/x86/kernel/fpu/regset.c        | 24 +++++++++++-----------
 arch/x86/kernel/fpu/signal.c        | 24 +++++++++++-----------
 arch/x86/kernel/fpu/xstate.c        |  8 ++++----
 arch/x86/kernel/process.c           |  2 +-
 arch/x86/kvm/x86.c                  | 22 +++++++++++---------
 arch/x86/math-emu/fpu_aux.c         |  2 +-
 arch/x86/math-emu/fpu_entry.c       |  4 ++--
 arch/x86/math-emu/fpu_system.h      |  2 +-
 13 files changed, 94 insertions(+), 71 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index c7a64e2806a9..d2fc19c0e457 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -484,7 +484,7 @@ static inline void fpregs_restore_userregs(void)
 		 */
 		mask = xfeatures_mask_restore_user() |
 			xfeatures_mask_supervisor();
-		__restore_fpregs_from_fpstate(&fpu->state, mask);
+		__restore_fpregs_from_fpstate(fpu->state, mask);
 
 		fpregs_activate(fpu);
 		fpu->last_cpu = cpu;
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index f5a38a5f3ae1..ad5cbf922e30 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -339,15 +339,32 @@ struct fpu {
 	/*
 	 * @state:
 	 *
-	 * In-memory copy of all FPU registers that we save/restore
-	 * over context switches. If the task is using the FPU then
-	 * the registers in the FPU are more recent than this state
-	 * copy. If the task context-switches away then they get
-	 * saved here and represent the FPU state.
+	 * A pointer to indicate the in-memory copy of all FPU registers
+	 * that are saved/restored over context switches.
+	 *
+	 * Initially @state points to @__default_state. When dynamic states
+	 * get used, a memory is allocated for the larger state copy and
+	 * @state is updated to point to it. Then, the state in ->state
+	 * supersedes and invalidates the state in @__default_state.
+	 *
+	 * In general, if the task is using the FPU then the registers in
+	 * the FPU are more recent than the state copy. If the task
+	 * context-switches away then they get saved in ->state and
+	 * represent the FPU state.
+	 */
+	union fpregs_state		*state;
+
+	/*
+	 * @__default_state:
+	 *
+	 * Initial in-memory copy of all FPU registers that saved/restored
+	 * over context switches. When the task is switched to dynamic
+	 * states, this copy is replaced with the new in-memory copy in
+	 * ->state.
 	 */
-	union fpregs_state		state;
+	union fpregs_state		__default_state;
 	/*
-	 * WARNING: 'state' is dynamically-sized.  Do not put
+	 * WARNING: '__default_state' is dynamically-sized.  Do not put
 	 * anything after it here.
 	 */
 };
diff --git a/arch/x86/include/asm/trace/fpu.h b/arch/x86/include/asm/trace/fpu.h
index 879b77792f94..ef82f4824ce7 100644
--- a/arch/x86/include/asm/trace/fpu.h
+++ b/arch/x86/include/asm/trace/fpu.h
@@ -22,8 +22,8 @@ DECLARE_EVENT_CLASS(x86_fpu,
 		__entry->fpu		= fpu;
 		__entry->load_fpu	= test_thread_flag(TIF_NEED_FPU_LOAD);
 		if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
-			__entry->xfeatures = fpu->state.xsave.header.xfeatures;
-			__entry->xcomp_bv  = fpu->state.xsave.header.xcomp_bv;
+			__entry->xfeatures = fpu->state->xsave.header.xfeatures;
+			__entry->xcomp_bv  = fpu->state->xsave.header.xcomp_bv;
 		}
 	),
 	TP_printk("x86/fpu: %p load: %d xfeatures: %llx xcomp_bv: %llx",
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 62cc993a890a..6b55b8c651f6 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -99,19 +99,19 @@ EXPORT_SYMBOL(irq_fpu_usable);
 void save_fpregs_to_fpstate(struct fpu *fpu)
 {
 	if (likely(use_xsave())) {
-		os_xsave(&fpu->state.xsave);
+		os_xsave(&fpu->state->xsave);
 
 		/*
 		 * AVX512 state is tracked here because its use is
 		 * known to slow the max clock speed of the core.
 		 */
-		if (fpu->state.xsave.header.xfeatures & XFEATURE_MASK_AVX512)
+		if (fpu->state->xsave.header.xfeatures & XFEATURE_MASK_AVX512)
 			fpu->avx512_timestamp = jiffies;
 		return;
 	}
 
 	if (likely(use_fxsr())) {
-		fxsave(&fpu->state.fxsave);
+		fxsave(&fpu->state->fxsave);
 		return;
 	}
 
@@ -119,8 +119,8 @@ void save_fpregs_to_fpstate(struct fpu *fpu)
 	 * Legacy FPU register saving, FNSAVE always clears FPU registers,
 	 * so we have to reload them from the memory state.
 	 */
-	asm volatile("fnsave %[fp]; fwait" : [fp] "=m" (fpu->state.fsave));
-	frstor(&fpu->state.fsave);
+	asm volatile("fnsave %[fp]; fwait" : [fp] "=m" (fpu->state->fsave));
+	frstor(&fpu->state->fsave);
 }
 EXPORT_SYMBOL(save_fpregs_to_fpstate);
 
@@ -235,7 +235,7 @@ void fpstate_init(struct fpu *fpu)
 	u64 mask;
 
 	if (likely(fpu)) {
-		state = &fpu->state;
+		state = fpu->state;
 		/* The dynamic user states are not prepared yet. */
 		mask = xfeatures_mask_all & ~xfeatures_mask_user_dynamic;
 		size = fpu_buf_cfg.min_size;
@@ -274,6 +274,8 @@ int fpu_clone(struct task_struct *dst)
 	if (!cpu_feature_enabled(X86_FEATURE_FPU))
 		return 0;
 
+	dst_fpu->state = &dst_fpu->__default_state;
+
 	/*
 	 * Don't let 'init optimized' areas of the XSAVE area
 	 * leak into the child task:
@@ -281,7 +283,7 @@ int fpu_clone(struct task_struct *dst)
 	 * The child does not inherit the dynamic states. So,
 	 * the xstate buffer has the minimum size.
 	 */
-	memset(&dst_fpu->state.xsave, 0, fpu_buf_cfg.min_size);
+	memset(&dst_fpu->state->xsave, 0, fpu_buf_cfg.min_size);
 
 	/*
 	 * If the FPU registers are not owned by current just memcpy() the
@@ -290,7 +292,7 @@ int fpu_clone(struct task_struct *dst)
 	 */
 	fpregs_lock();
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
-		memcpy(&dst_fpu->state, &src_fpu->state, fpu_buf_cfg.min_size);
+		memcpy(dst_fpu->state, src_fpu->state, fpu_buf_cfg.min_size);
 
 	else
 		save_fpregs_to_fpstate(dst_fpu);
@@ -377,7 +379,7 @@ static void fpu_reset_fpstate(void)
 	 * user space as PKRU is eagerly written in switch_to() and
 	 * flush_thread().
 	 */
-	memcpy(&fpu->state, &init_fpstate, init_fpstate_copy_size());
+	memcpy(fpu->state, &init_fpstate, init_fpstate_copy_size());
 	set_thread_flag(TIF_NEED_FPU_LOAD);
 	fpregs_unlock();
 }
@@ -404,7 +406,7 @@ void fpu__clear_user_states(struct fpu *fpu)
 	 */
 	if (xfeatures_mask_supervisor() &&
 	    !fpregs_state_valid(fpu, smp_processor_id())) {
-		os_xrstor(&fpu->state.xsave, xfeatures_mask_supervisor());
+		os_xrstor(&fpu->state->xsave, xfeatures_mask_supervisor());
 	}
 
 	/* Reset user states in registers. */
@@ -486,11 +488,11 @@ int fpu__exception_code(struct fpu *fpu, int trap_nr)
 		 * fully reproduce the context of the exception.
 		 */
 		if (boot_cpu_has(X86_FEATURE_FXSR)) {
-			cwd = fpu->state.fxsave.cwd;
-			swd = fpu->state.fxsave.swd;
+			cwd = fpu->state->fxsave.cwd;
+			swd = fpu->state->fxsave.swd;
 		} else {
-			cwd = (unsigned short)fpu->state.fsave.cwd;
-			swd = (unsigned short)fpu->state.fsave.swd;
+			cwd = (unsigned short)fpu->state->fsave.cwd;
+			swd = (unsigned short)fpu->state->fsave.swd;
 		}
 
 		err = swd & ~cwd;
@@ -504,7 +506,7 @@ int fpu__exception_code(struct fpu *fpu, int trap_nr)
 		unsigned short mxcsr = MXCSR_DEFAULT;
 
 		if (boot_cpu_has(X86_FEATURE_XMM))
-			mxcsr = fpu->state.fxsave.mxcsr;
+			mxcsr = fpu->state->fxsave.mxcsr;
 
 		err = ~(mxcsr >> 7) & mxcsr;
 	}
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index da7341f95008..cd1f3114f3ca 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -31,10 +31,12 @@ static void fpu__init_cpu_generic(void)
 		cr0 |= X86_CR0_EM;
 	write_cr0(cr0);
 
+	current->thread.fpu.state = &current->thread.fpu.__default_state;
+
 	/* Flush out any pending x87 state: */
 #ifdef CONFIG_MATH_EMULATION
 	if (!boot_cpu_has(X86_FEATURE_FPU))
-		fpstate_init_soft(&current->thread.fpu.state.soft);
+		fpstate_init_soft(&current->thread.fpu.state->soft);
 	else
 #endif
 		asm volatile ("fninit");
@@ -153,7 +155,7 @@ static void __init fpu__init_task_struct_size(void)
 	 * Subtract off the static size of the register state.
 	 * It potentially has a bunch of padding.
 	 */
-	task_size -= sizeof(((struct task_struct *)0)->thread.fpu.state);
+	task_size -= sizeof(((struct task_struct *)0)->thread.fpu.__default_state);
 
 	/*
 	 * Add back the dynamically-calculated register state
@@ -170,7 +172,7 @@ static void __init fpu__init_task_struct_size(void)
 	 * you hit a compile error here, check the structure to
 	 * see if something got added to the end.
 	 */
-	CHECK_MEMBER_AT_END_OF(struct fpu, state);
+	CHECK_MEMBER_AT_END_OF(struct fpu, __default_state);
 	CHECK_MEMBER_AT_END_OF(struct thread_struct, fpu);
 	CHECK_MEMBER_AT_END_OF(struct task_struct, thread);
 
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 80ee64183c7d..7ea10f98c2b0 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -74,8 +74,8 @@ int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
 	sync_fpstate(fpu);
 
 	if (!use_xsave()) {
-		return membuf_write(&to, &fpu->state.fxsave,
-				    sizeof(fpu->state.fxsave));
+		return membuf_write(&to, &fpu->state->fxsave,
+				    sizeof(fpu->state->fxsave));
 	}
 
 	copy_xstate_to_uabi_buf(to, target, XSTATE_COPY_FX);
@@ -110,15 +110,15 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
 	fpu_force_restore(fpu);
 
 	/* Copy the state  */
-	memcpy(&fpu->state.fxsave, &newstate, sizeof(newstate));
+	memcpy(&fpu->state->fxsave, &newstate, sizeof(newstate));
 
 	/* Clear xmm8..15 */
-	BUILD_BUG_ON(sizeof(fpu->state.fxsave.xmm_space) != 16 * 16);
-	memset(&fpu->state.fxsave.xmm_space[8], 0, 8 * 16);
+	BUILD_BUG_ON(sizeof(fpu->state->fxsave.xmm_space) != 16 * 16);
+	memset(&fpu->state->fxsave.xmm_space[8], 0, 8 * 16);
 
 	/* Mark FP and SSE as in use when XSAVE is enabled */
 	if (use_xsave())
-		fpu->state.xsave.header.xfeatures |= XFEATURE_MASK_FPSSE;
+		fpu->state->xsave.header.xfeatures |= XFEATURE_MASK_FPSSE;
 
 	return 0;
 }
@@ -283,7 +283,7 @@ static void __convert_from_fxsr(struct user_i387_ia32_struct *env,
 void
 convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
 {
-	__convert_from_fxsr(env, tsk, &tsk->thread.fpu.state.fxsave);
+	__convert_from_fxsr(env, tsk, &tsk->thread.fpu.state->fxsave);
 }
 
 void convert_to_fxsr(struct fxregs_state *fxsave,
@@ -326,7 +326,7 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
 		return fpregs_soft_get(target, regset, to);
 
 	if (!cpu_feature_enabled(X86_FEATURE_FXSR)) {
-		return membuf_write(&to, &fpu->state.fsave,
+		return membuf_write(&to, &fpu->state->fsave,
 				    sizeof(struct fregs_state));
 	}
 
@@ -337,7 +337,7 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
 		copy_xstate_to_uabi_buf(mb, target, XSTATE_COPY_FP);
 		fx = &fxsave;
 	} else {
-		fx = &fpu->state.fxsave;
+		fx = &fpu->state->fxsave;
 	}
 
 	__convert_from_fxsr(&env, target, fx);
@@ -366,16 +366,16 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
 	fpu_force_restore(fpu);
 
 	if (cpu_feature_enabled(X86_FEATURE_FXSR))
-		convert_to_fxsr(&fpu->state.fxsave, &env);
+		convert_to_fxsr(&fpu->state->fxsave, &env);
 	else
-		memcpy(&fpu->state.fsave, &env, sizeof(env));
+		memcpy(&fpu->state->fsave, &env, sizeof(env));
 
 	/*
 	 * Update the header bit in the xsave header, indicating the
 	 * presence of FP.
 	 */
 	if (cpu_feature_enabled(X86_FEATURE_XSAVE))
-		fpu->state.xsave.header.xfeatures |= XFEATURE_MASK_FP;
+		fpu->state->xsave.header.xfeatures |= XFEATURE_MASK_FP;
 
 	return 0;
 }
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index f5ec334c5a4e..8b333b1a4d07 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -67,13 +67,13 @@ static inline int check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
 static inline int save_fsave_header(struct task_struct *tsk, void __user *buf)
 {
 	if (use_fxsr()) {
-		struct xregs_state *xsave = &tsk->thread.fpu.state.xsave;
+		struct xregs_state *xsave = &tsk->thread.fpu.state->xsave;
 		struct user_i387_ia32_struct env;
 		struct _fpstate_32 __user *fp = buf;
 
 		fpregs_lock();
 		if (!test_thread_flag(TIF_NEED_FPU_LOAD))
-			fxsave(&tsk->thread.fpu.state.fxsave);
+			fxsave(&tsk->thread.fpu.state->fxsave);
 		fpregs_unlock();
 
 		convert_from_fxsr(&env, tsk);
@@ -294,7 +294,7 @@ static int restore_fpregs_from_user(void __user *buf, u64 xrestore,
 	 * been restored from a user buffer directly.
 	 */
 	if (test_thread_flag(TIF_NEED_FPU_LOAD) && xfeatures_mask_supervisor())
-		os_xrstor(&fpu->state.xsave, xfeatures_mask_supervisor());
+		os_xrstor(&fpu->state->xsave, xfeatures_mask_supervisor());
 
 	fpregs_mark_activate();
 	fpregs_unlock();
@@ -365,7 +365,7 @@ static int __fpu_restore_sig(void __user *buf, void __user *buf_fx,
 		 * the right place in memory. It's ia32 mode. Shrug.
 		 */
 		if (xfeatures_mask_supervisor())
-			os_xsave(&fpu->state.xsave);
+			os_xsave(&fpu->state->xsave);
 		set_thread_flag(TIF_NEED_FPU_LOAD);
 	}
 	__fpu_invalidate_fpregs_state(fpu);
@@ -377,21 +377,21 @@ static int __fpu_restore_sig(void __user *buf, void __user *buf_fx,
 		if (ret)
 			return ret;
 	} else {
-		if (__copy_from_user(&fpu->state.fxsave, buf_fx,
-				     sizeof(fpu->state.fxsave)))
+		if (__copy_from_user(&fpu->state->fxsave, buf_fx,
+				     sizeof(fpu->state->fxsave)))
 			return -EFAULT;
 
 		/* Reject invalid MXCSR values. */
-		if (fpu->state.fxsave.mxcsr & ~mxcsr_feature_mask)
+		if (fpu->state->fxsave.mxcsr & ~mxcsr_feature_mask)
 			return -EINVAL;
 
 		/* Enforce XFEATURE_MASK_FPSSE when XSAVE is enabled */
 		if (use_xsave())
-			fpu->state.xsave.header.xfeatures |= XFEATURE_MASK_FPSSE;
+			fpu->state->xsave.header.xfeatures |= XFEATURE_MASK_FPSSE;
 	}
 
 	/* Fold the legacy FP storage */
-	convert_to_fxsr(&fpu->state.fxsave, &env);
+	convert_to_fxsr(&fpu->state->fxsave, &env);
 
 	fpregs_lock();
 	if (use_xsave()) {
@@ -406,10 +406,10 @@ static int __fpu_restore_sig(void __user *buf, void __user *buf_fx,
 		 */
 		u64 mask = user_xfeatures | xfeatures_mask_supervisor();
 
-		fpu->state.xsave.header.xfeatures &= mask;
-		ret = os_xrstor_safe(&fpu->state.xsave, xfeatures_mask_all);
+		fpu->state->xsave.header.xfeatures &= mask;
+		ret = os_xrstor_safe(&fpu->state->xsave, xfeatures_mask_all);
 	} else {
-		ret = fxrstor_safe(&fpu->state.fxsave);
+		ret = fxrstor_safe(&fpu->state->fxsave);
 	}
 
 	if (likely(!ret))
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 2e474fbdc241..4496750208a8 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -882,7 +882,7 @@ static void *__raw_xsave_addr(struct fpu *fpu, int xfeature_nr)
 	}
 
 	if (fpu)
-		xsave = &fpu->state.xsave;
+		xsave = &fpu->state->xsave;
 	else
 		xsave = &init_fpstate.xsave;
 
@@ -925,7 +925,7 @@ void *get_xsave_addr(struct fpu *fpu, int xfeature_nr)
 		  "get of unsupported state");
 
 	if (fpu)
-		xsave = &fpu->state.xsave;
+		xsave = &fpu->state->xsave;
 	else
 		xsave = &init_fpstate.xsave;
 
@@ -1017,7 +1017,7 @@ void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
 			     enum xstate_copy_mode copy_mode)
 {
 	const unsigned int off_mxcsr = offsetof(struct fxregs_state, mxcsr);
-	struct xregs_state *xsave = &tsk->thread.fpu.state.xsave;
+	struct xregs_state *xsave = &tsk->thread.fpu.state->xsave;
 	struct xregs_state *xinit = &init_fpstate.xsave;
 	struct xstate_header header;
 	unsigned int zerofrom;
@@ -1134,7 +1134,7 @@ static int copy_from_buffer(void *dst, unsigned int offset, unsigned int size,
 static int copy_uabi_to_xstate(struct fpu *fpu, const void *kbuf,
 			       const void __user *ubuf)
 {
-	struct xregs_state *xsave = &fpu->state.xsave;
+	struct xregs_state *xsave = &fpu->state->xsave;
 	unsigned int offset, size;
 	struct xstate_header hdr;
 	u64 mask;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index d1ca963cb8f7..33f5d8d07367 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -92,7 +92,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 
 void arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
 {
-	*offset = offsetof(struct thread_struct, fpu.state);
+	*offset = offsetof(struct thread_struct, fpu.__default_state);
 	/* The buffer embedded in thread_struct has the minimum size. */
 	*size = fpu_buf_cfg.min_size;
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 948c207a8752..7c6dcc21e962 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4697,7 +4697,7 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
 
 static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
 {
-	struct xregs_state *xsave = &vcpu->arch.guest_fpu->state.xsave;
+	struct xregs_state *xsave = &vcpu->arch.guest_fpu->state->xsave;
 	u64 xstate_bv = xsave->header.xfeatures;
 	u64 valid;
 
@@ -4740,7 +4740,7 @@ static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
 
 static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
 {
-	struct xregs_state *xsave = &vcpu->arch.guest_fpu->state.xsave;
+	struct xregs_state *xsave = &vcpu->arch.guest_fpu->state->xsave;
 	u64 xstate_bv = *(u64 *)(src + XSAVE_HDR_OFFSET);
 	u64 valid;
 
@@ -4793,7 +4793,7 @@ static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
 		fill_xsave((u8 *) guest_xsave->region, vcpu);
 	} else {
 		memcpy(guest_xsave->region,
-			&vcpu->arch.guest_fpu->state.fxsave,
+			&vcpu->arch.guest_fpu->state->fxsave,
 			sizeof(struct fxregs_state));
 		*(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)] =
 			XFEATURE_MASK_FPSSE;
@@ -4827,7 +4827,7 @@ static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
 		if (xstate_bv & ~XFEATURE_MASK_FPSSE ||
 			mxcsr & ~mxcsr_feature_mask)
 			return -EINVAL;
-		memcpy(&vcpu->arch.guest_fpu->state.fxsave,
+		memcpy(&vcpu->arch.guest_fpu->state->fxsave,
 			guest_xsave->region, sizeof(struct fxregs_state));
 	}
 	return 0;
@@ -9909,7 +9909,7 @@ static void kvm_save_current_fpu(struct fpu *fpu)
 	 * always has the minimum size.
 	 */
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
-		memcpy(&fpu->state, &current->thread.fpu.state,
+		memcpy(fpu->state, current->thread.fpu.state,
 		       fpu_buf_cfg.min_size);
 	else
 		save_fpregs_to_fpstate(fpu);
@@ -9928,7 +9928,7 @@ static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 	 */
 	if (vcpu->arch.guest_fpu)
 		/* PKRU is separately restored in kvm_x86_ops.run. */
-		__restore_fpregs_from_fpstate(&vcpu->arch.guest_fpu->state,
+		__restore_fpregs_from_fpstate(vcpu->arch.guest_fpu->state,
 					~XFEATURE_MASK_PKRU);
 
 	fpregs_mark_activate();
@@ -9949,7 +9949,7 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.guest_fpu)
 		kvm_save_current_fpu(vcpu->arch.guest_fpu);
 
-	restore_fpregs_from_fpstate(&vcpu->arch.user_fpu->state);
+	restore_fpregs_from_fpstate(vcpu->arch.user_fpu->state);
 
 	fpregs_mark_activate();
 	fpregs_unlock();
@@ -10539,7 +10539,7 @@ int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
 
 	vcpu_load(vcpu);
 
-	fxsave = &vcpu->arch.guest_fpu->state.fxsave;
+	fxsave = &vcpu->arch.guest_fpu->state->fxsave;
 	memcpy(fpu->fpr, fxsave->st_space, 128);
 	fpu->fcw = fxsave->cwd;
 	fpu->fsw = fxsave->swd;
@@ -10562,7 +10562,7 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
 
 	vcpu_load(vcpu);
 
-	fxsave = &vcpu->arch.guest_fpu->state.fxsave;
+	fxsave = &vcpu->arch.guest_fpu->state->fxsave;
 
 	memcpy(fxsave->st_space, fpu->fpr, 128);
 	fxsave->cwd = fpu->fcw;
@@ -10620,7 +10620,7 @@ static void fx_init(struct kvm_vcpu *vcpu)
 
 	fpstate_init(vcpu->arch.guest_fpu);
 	if (boot_cpu_has(X86_FEATURE_XSAVES))
-		vcpu->arch.guest_fpu->state.xsave.header.xcomp_bv =
+		vcpu->arch.guest_fpu->state->xsave.header.xcomp_bv =
 			host_xcr0 | XSTATE_COMPACTION_ENABLED;
 
 	/*
@@ -10700,6 +10700,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 		pr_err("kvm: failed to allocate userspace's fpu\n");
 		goto free_emulate_ctxt;
 	}
+	vcpu->arch.user_fpu->state = &vcpu->arch.user_fpu->__default_state;
 
 	vcpu->arch.guest_fpu = kmem_cache_zalloc(x86_fpu_cache,
 						 GFP_KERNEL_ACCOUNT);
@@ -10707,6 +10708,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 		pr_err("kvm: failed to allocate vcpu's fpu\n");
 		goto free_user_fpu;
 	}
+	vcpu->arch.guest_fpu->state = &vcpu->arch.guest_fpu->__default_state;
 	fx_init(vcpu);
 
 	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
diff --git a/arch/x86/math-emu/fpu_aux.c b/arch/x86/math-emu/fpu_aux.c
index 034748459482..51432a73024c 100644
--- a/arch/x86/math-emu/fpu_aux.c
+++ b/arch/x86/math-emu/fpu_aux.c
@@ -53,7 +53,7 @@ void fpstate_init_soft(struct swregs_state *soft)
 
 void finit(void)
 {
-	fpstate_init_soft(&current->thread.fpu.state.soft);
+	fpstate_init_soft(&current->thread.fpu.state->soft);
 }
 
 /*
diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 8679a9d6c47f..6ba56632170e 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -640,7 +640,7 @@ int fpregs_soft_set(struct task_struct *target,
 		    unsigned int pos, unsigned int count,
 		    const void *kbuf, const void __user *ubuf)
 {
-	struct swregs_state *s387 = &target->thread.fpu.state.soft;
+	struct swregs_state *s387 = &target->thread.fpu.state->soft;
 	void *space = s387->st_space;
 	int ret;
 	int offset, other, i, tags, regnr, tag, newtop;
@@ -691,7 +691,7 @@ int fpregs_soft_get(struct task_struct *target,
 		    const struct user_regset *regset,
 		    struct membuf to)
 {
-	struct swregs_state *s387 = &target->thread.fpu.state.soft;
+	struct swregs_state *s387 = &target->thread.fpu.state->soft;
 	const void *space = s387->st_space;
 	int offset = (S387->ftop & 7) * 10, other = 80 - offset;
 
diff --git a/arch/x86/math-emu/fpu_system.h b/arch/x86/math-emu/fpu_system.h
index 9b41391867dc..a6291ddfdda6 100644
--- a/arch/x86/math-emu/fpu_system.h
+++ b/arch/x86/math-emu/fpu_system.h
@@ -73,7 +73,7 @@ static inline bool seg_writable(struct desc_struct *d)
 	return (d->type & SEG_TYPE_EXECUTE_MASK) == SEG_TYPE_WRITABLE;
 }
 
-#define I387			(&current->thread.fpu.state)
+#define I387			(current->thread.fpu.state)
 #define FPU_info		(I387->soft.info)
 
 #define FPU_CS			(*(unsigned short *) &(FPU_info->regs->cs))
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 09/29] x86/fpu/xstate: Introduce helpers to manage the XSTATE buffer dynamically
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (7 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 08/29] x86/fpu/xstate: Convert the struct fpu 'state' field to a pointer Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 10/29] x86/fpu/xstate: Update the XSTATE save function to support dynamic states Chang S. Bae
                   ` (22 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

The static XSTATE per-task buffer contains the extended register states --
but it is not expandable at runtime. Introduce runtime methods and a new
fpu struct field to support the expansion.

fpu->state_mask indicates which state components are to be saved in the
XSTATE buffer.

realloc_xstate_buffer() uses vzalloc(). If use of this mechanism grows to
re-allocate buffers larger than 64KB, a more sophisticated allocation
scheme that includes purpose-built reclaim capability might be justified.

Introduce a new helper -- calculate_xstate_buf_size_from_mask() to
calculate the buffer size.

Also, use the new field and helper to initialize the buffer.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v10:
* Update comment and a variable name for XSTATE calculation function. (Dave
  Hansen)

Changes from v9:
* Rename and simplify helpers. (Borislav Petkov)
* Add and fix the code comment and the variable name. (Borislav Petkov)
* Use cpu_feature_enabled() instead of boot_cpu_has(). (Borislav Petkov)
* Use fpu->state_mask to ensure states to be written in
  copy_uabi_to_xstate() -- moved from Patch11. (Borislav Petkov)

Changes from v5:
* Added to ensure XSAVES format with current in fpu_reset_fpstate() for new
  base code.

Changes from v3:
* Updated code comments. (Borislav Petkov)
* Used vzalloc() instead of vmalloc() with memset(). (Borislav Petkov)
* Removed the max size check for >64KB. (Borislav Petkov)
* Removed the allocation size check in the helper. (Borislav Petkov)
* Switched the function description in the kernel-doc style.
* Used them for buffer initialization -- moved from the next patch.

Changes from v2:
* Updated the changelog with task->fpu removed. (Borislav Petkov)
* Replaced 'area' with 'buffer' in the comments and the changelog.
* Updated the code comments.

Changes from v1:
* Removed unneeded interrupt masking (Andy Lutomirski)
* Added vmalloc() error tracing (Dave Hansen, PeterZ, and Andy Lutomirski)
---
 arch/x86/include/asm/fpu/types.h  |   7 ++
 arch/x86/include/asm/fpu/xstate.h |   3 +
 arch/x86/kernel/fpu/core.c        |  19 +++--
 arch/x86/kernel/fpu/xstate.c      | 125 ++++++++++++++++++++++++++++++
 4 files changed, 147 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index ad5cbf922e30..0cc9f6c5a10c 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -336,6 +336,13 @@ struct fpu {
 	 */
 	unsigned long			avx512_timestamp;
 
+	/*
+	 * @state_mask:
+	 *
+	 * The bitmap represents state components to be saved in ->state.
+	 */
+	u64				state_mask;
+
 	/*
 	 * @state:
 	 *
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index c4a0914b7717..9574ee20c6aa 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -153,7 +153,10 @@ struct fpu_xstate_buffer_config {
 
 extern struct fpu_xstate_buffer_config fpu_buf_cfg;
 
+unsigned int calculate_xstate_buf_size_from_mask(u64 mask);
 void *get_xsave_addr(struct fpu *fpu, int xfeature_nr);
+int realloc_xstate_buffer(struct fpu *fpu, u64 mask);
+void free_xstate_buffer(union fpregs_state *state);
 int xfeature_size(int xfeature_nr);
 int copy_uabi_from_kernel_to_xstate(struct fpu *fpu, const void *kbuf);
 int copy_sigframe_from_user_to_xstate(struct fpu *fpu, const void __user *ubuf);
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 6b55b8c651f6..2941d03912db 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -236,9 +236,8 @@ void fpstate_init(struct fpu *fpu)
 
 	if (likely(fpu)) {
 		state = fpu->state;
-		/* The dynamic user states are not prepared yet. */
-		mask = xfeatures_mask_all & ~xfeatures_mask_user_dynamic;
-		size = fpu_buf_cfg.min_size;
+		mask = fpu->state_mask;
+		size = calculate_xstate_buf_size_from_mask(fpu->state_mask);
 	} else {
 		state = &init_fpstate;
 		mask = xfeatures_mask_all;
@@ -274,14 +273,16 @@ int fpu_clone(struct task_struct *dst)
 	if (!cpu_feature_enabled(X86_FEATURE_FPU))
 		return 0;
 
+	/*
+	 * The child does not inherit the dynamic states. Thus, use the
+	 * buffer embedded in struct task_struct, which has the minimum
+	 * size.
+	 */
+	dst_fpu->state_mask = (xfeatures_mask_all & ~xfeatures_mask_user_dynamic);
 	dst_fpu->state = &dst_fpu->__default_state;
-
 	/*
 	 * Don't let 'init optimized' areas of the XSAVE area
 	 * leak into the child task:
-	 *
-	 * The child does not inherit the dynamic states. So,
-	 * the xstate buffer has the minimum size.
 	 */
 	memset(&dst_fpu->state->xsave, 0, fpu_buf_cfg.min_size);
 
@@ -380,6 +381,10 @@ static void fpu_reset_fpstate(void)
 	 * flush_thread().
 	 */
 	memcpy(fpu->state, &init_fpstate, init_fpstate_copy_size());
+	/* Adjust the xstate buffer format for current. */
+	if (cpu_feature_enabled(X86_FEATURE_XSAVES))
+		fpstate_init_xstate(&fpu->state->xsave, fpu->state_mask);
+
 	set_thread_flag(TIF_NEED_FPU_LOAD);
 	fpregs_unlock();
 }
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 4496750208a8..eafedb58b23b 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -10,6 +10,7 @@
 #include <linux/pkeys.h>
 #include <linux/seq_file.h>
 #include <linux/proc_fs.h>
+#include <linux/vmalloc.h>
 
 #include <asm/fpu/api.h>
 #include <asm/fpu/internal.h>
@@ -19,6 +20,7 @@
 
 #include <asm/tlbflush.h>
 #include <asm/cpufeature.h>
+#include <asm/trace/fpu.h>
 
 /*
  * Although we spell it out in here, the Processor Trace
@@ -76,6 +78,12 @@ static unsigned int xstate_comp_offsets[XFEATURE_MAX] __ro_after_init =
 	{ [ 0 ... XFEATURE_MAX - 1] = -1};
 static unsigned int xstate_supervisor_only_offsets[XFEATURE_MAX] __ro_after_init =
 	{ [ 0 ... XFEATURE_MAX - 1] = -1};
+/*
+ * True if the buffer of the corresponding XFEATURE is located on the next 64
+ * byte boundary. Otherwise, it follows the preceding component immediately.
+ */
+static bool xstate_64byte_aligned[XFEATURE_MAX] __ro_after_init =
+	{ [ 0 ... XFEATURE_MAX - 1] = false};
 
 struct fpu_xstate_buffer_config fpu_buf_cfg __ro_after_init;
 EXPORT_SYMBOL_GPL(fpu_buf_cfg);
@@ -131,6 +139,60 @@ static bool xfeature_is_supervisor(int xfeature_nr)
 	return ecx & 1;
 }
 
+/**
+ * calculate_xstate_buf_size_from_mask - Calculate the amount of space
+ *					 needed to store an xstate buffer
+ *					 with the given features
+ * @mask:	The set of components for which the space is needed.
+ *
+ * Consults values populated in setup_xstate_features(). Must be called
+ * after that setup.
+ *
+ * Returns:	The buffer size
+ */
+unsigned int calculate_xstate_buf_size_from_mask(u64 mask)
+{
+	unsigned int size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
+	int i, last_feature_nr;
+
+	if (!mask)
+		return 0;
+
+	/*
+	 * The minimum buffer size excludes the dynamic user state. When a
+	 * task uses the state, the buffer can grow up to the max size.
+	 */
+	if (mask == (xfeatures_mask_all & ~xfeatures_mask_user_dynamic))
+		return fpu_buf_cfg.min_size;
+	else if (mask == xfeatures_mask_all)
+		return fpu_buf_cfg.max_size;
+
+	last_feature_nr = fls64(mask) - 1;
+	if (last_feature_nr < FIRST_EXTENDED_XFEATURE)
+		return size;
+
+	/*
+	 * Each state offset in the non-compacted format is fixed. Take the
+	 * size from the last feature 'nr'.
+	 */
+	if (!cpu_feature_enabled(X86_FEATURE_XSAVES))
+		return xstate_offsets[last_feature_nr] + xstate_sizes[last_feature_nr];
+
+	/*
+	 * With the given mask, no relevant size is found so far. So,
+	 * calculate it by summing up each state size.
+	 */
+	for (i = FIRST_EXTENDED_XFEATURE; i <= last_feature_nr; i++) {
+		if (!(mask & BIT_ULL(i)))
+			continue;
+
+		if (xstate_64byte_aligned[i])
+			size = ALIGN(size, 64);
+		size += xstate_sizes[i];
+	}
+	return size;
+}
+
 /*
  * Enable the extended processor state save/restore feature.
  * Called once per CPU onlining.
@@ -202,6 +264,7 @@ static void __init setup_xstate_features(void)
 			continue;
 
 		xstate_offsets[i] = ebx;
+		xstate_64byte_aligned[i] = (ecx & 2) ? true : false;
 
 		/*
 		 * In our xstate size checks, we assume that the highest-numbered
@@ -805,6 +868,12 @@ void __init fpu__init_system_xstate(void)
 	if (err)
 		goto out_disable;
 
+	/*
+	 * Initially, the FPU buffer used is the static one, without
+	 * dynamic states.
+	 */
+	current->thread.fpu.state_mask = (xfeatures_mask_all & ~xfeatures_mask_user_dynamic);
+
 	/*
 	 * Update info used for ptrace frames; use standard-format size and no
 	 * supervisor xstates:
@@ -995,6 +1064,60 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
 }
 #endif /* ! CONFIG_ARCH_HAS_PKEYS */
 
+void free_xstate_buffer(union fpregs_state *state)
+{
+	vfree(state);
+}
+
+/**
+ * realloc_xstate_buffer - Re-alloc a buffer with the size calculated from
+ *			   @mask.
+ *
+ * @fpu:	A struct fpu * pointer
+ * @mask:	The bitmap tells which components to be saved in the new
+ *		buffer.
+ *
+ * It deals with enlarging the xstate buffer with dynamic states.
+ *
+ * Use vzalloc() simply here. If the task with a vzalloc()-allocated buffer
+ * tends to terminate quickly, vfree()-induced IPIs may be a concern.
+ * Caching may be helpful for this. But the task with large state is likely
+ * to live longer.
+ *
+ * Also, this method does not shrink or reclaim the buffer.
+ *
+ * Returns 0 on success, -ENOMEM on allocation error.
+ */
+int realloc_xstate_buffer(struct fpu *fpu, u64 mask)
+{
+	union fpregs_state *state;
+	u64 state_mask;
+
+	state_mask = fpu->state_mask | mask;
+	if ((state_mask & fpu->state_mask) == state_mask)
+		return 0;
+
+	state = vzalloc(calculate_xstate_buf_size_from_mask(state_mask));
+	if (!state)
+		return -ENOMEM;
+
+	if (cpu_feature_enabled(X86_FEATURE_XSAVES))
+		fpstate_init_xstate(&state->xsave, state_mask);
+
+	/* Free the old buffer */
+	if (fpu->state != &fpu->__default_state)
+		free_xstate_buffer(fpu->state);
+
+	/*
+	 * As long as the register state is intact, save the xstate in the
+	 * new buffer at the next context switch or ptrace's context
+	 * injection.
+	 */
+	fpu->state = state;
+	fpu->state_mask = state_mask;
+	return 0;
+}
+
 static void copy_feature(bool from_xstate, struct membuf *to, void *xstate,
 			 void *init_xstate, unsigned int size)
 {
@@ -1147,6 +1270,8 @@ static int copy_uabi_to_xstate(struct fpu *fpu, const void *kbuf,
 	if (validate_user_xstate_header(&hdr))
 		return -EINVAL;
 
+	hdr.xfeatures &= fpu->state_mask;
+
 	/* Validate MXCSR when any of the related features is in use */
 	mask = XFEATURE_MASK_FP | XFEATURE_MASK_SSE | XFEATURE_MASK_YMM;
 	if (hdr.xfeatures & mask) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 10/29] x86/fpu/xstate: Update the XSTATE save function to support dynamic states
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (8 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 09/29] x86/fpu/xstate: Introduce helpers to manage the XSTATE buffer dynamically Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 11/29] x86/fpu/xstate: Update the XSTATE buffer address finder " Chang S. Bae
                   ` (21 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae, kvm

Extend os_xsave() to receive a mask argument of which states to save, in
preparation for dynamic user state handling.

Update KVM to set a valid fpu->state_mask, so it can continue to share with
the core code.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: kvm@vger.kernel.org
---
Changes from v5:
* Adjusted the changelog and code for the new base code.

Changes from v3:
* Updated the changelog. (Borislav Petkov)
* Made the code change more reviewable.

Changes from v2:
* Updated the changelog to clarify the KVM code changes.
---
 arch/x86/include/asm/fpu/internal.h | 3 +--
 arch/x86/kernel/fpu/core.c          | 2 +-
 arch/x86/kernel/fpu/signal.c        | 2 +-
 arch/x86/kvm/x86.c                  | 9 +++++++--
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index d2fc19c0e457..263e349ff85a 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -298,9 +298,8 @@ static inline void os_xrstor_booting(struct xregs_state *xstate)
  * Uses either XSAVE or XSAVEOPT or XSAVES depending on the CPU features
  * and command line options. The choice is permanent until the next reboot.
  */
-static inline void os_xsave(struct xregs_state *xstate)
+static inline void os_xsave(struct xregs_state *xstate, u64 mask)
 {
-	u64 mask = xfeatures_mask_all;
 	u32 lmask = mask;
 	u32 hmask = mask >> 32;
 	int err;
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 2941d03912db..164e75c37dbb 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -99,7 +99,7 @@ EXPORT_SYMBOL(irq_fpu_usable);
 void save_fpregs_to_fpstate(struct fpu *fpu)
 {
 	if (likely(use_xsave())) {
-		os_xsave(&fpu->state->xsave);
+		os_xsave(&fpu->state->xsave, fpu->state_mask);
 
 		/*
 		 * AVX512 state is tracked here because its use is
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 8b333b1a4d07..fe2732db6d6b 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -365,7 +365,7 @@ static int __fpu_restore_sig(void __user *buf, void __user *buf_fx,
 		 * the right place in memory. It's ia32 mode. Shrug.
 		 */
 		if (xfeatures_mask_supervisor())
-			os_xsave(&fpu->state->xsave);
+			os_xsave(&fpu->state->xsave, fpu->state_mask);
 		set_thread_flag(TIF_NEED_FPU_LOAD);
 	}
 	__fpu_invalidate_fpregs_state(fpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7c6dcc21e962..b0de53f4e5e9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9908,11 +9908,16 @@ static void kvm_save_current_fpu(struct fpu *fpu)
 	 * KVM does not support dynamic user states yet. Assume the buffer
 	 * always has the minimum size.
 	 */
-	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+	if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
 		memcpy(fpu->state, current->thread.fpu.state,
 		       fpu_buf_cfg.min_size);
-	else
+	} else {
+		struct fpu *src_fpu = &current->thread.fpu;
+
+		if (fpu->state_mask != src_fpu->state_mask)
+			fpu->state_mask = src_fpu->state_mask;
 		save_fpregs_to_fpstate(fpu);
+	}
 }
 
 /* Swap (qemu) user FPU context for the guest FPU context. */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 11/29] x86/fpu/xstate: Update the XSTATE buffer address finder to support dynamic states
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (9 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 10/29] x86/fpu/xstate: Update the XSTATE save function to support dynamic states Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 12/29] x86/fpu/xstate: Update the XSTATE context copy function " Chang S. Bae
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

__raw_xsave_addr() returns the requested component's pointer in an XSTATE
buffer, by simply looking up the offset table. The offset used to be fixed,
but, with dynamic user states, it becomes variable.

calculate_xstate_buf_size_from_mask() has a routine to find an offset at
runtime. Refactor to use it for the address finder.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v9:
* Update the function description. (Borislav Petkov)

Changes from v5:
* Updated for future proofed __raw_xsave_addr().

Changes from v3:
* Added the function description in the kernel-doc style. (Borislav Petkov)
* Removed 'no functional change' in the changelog. (Borislav Petkov)
---
 arch/x86/kernel/fpu/xstate.c | 71 +++++++++++++++++++++++++-----------
 1 file changed, 50 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index eafedb58b23b..2cb0d8c2eeeb 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -139,10 +139,36 @@ static bool xfeature_is_supervisor(int xfeature_nr)
 	return ecx & 1;
 }
 
+/**
+ * get_xstate_comp_offset - Find the feature offset in the compacted format.
+ * @mask:	The set of components located in the compacted format
+ * @feature_nr:	The feature number
+ *
+ * Returns:	The offset value
+ */
+static unsigned int get_xstate_comp_offset(u64 mask, int feature_nr)
+{
+	unsigned int next_offset, offset = 0;
+	int i;
+
+	if (feature_nr < FIRST_EXTENDED_XFEATURE)
+		return xstate_comp_offsets[feature_nr];
+
+	for (next_offset = FXSAVE_SIZE + XSAVE_HDR_SIZE, i = FIRST_EXTENDED_XFEATURE;
+	     i <= feature_nr; i++) {
+		if (!(mask & BIT_ULL(i)))
+			continue;
+
+		offset = xstate_64byte_aligned[i] ? ALIGN(next_offset, 64) : next_offset;
+		next_offset += xstate_sizes[i];
+	}
+	return offset;
+}
+
 /**
  * calculate_xstate_buf_size_from_mask - Calculate the amount of space
- *					 needed to store an xstate buffer
- *					 with the given features
+ *					 needed to store buffer with the
+ *					 given features.
  * @mask:	The set of components for which the space is needed.
  *
  * Consults values populated in setup_xstate_features(). Must be called
@@ -152,8 +178,8 @@ static bool xfeature_is_supervisor(int xfeature_nr)
  */
 unsigned int calculate_xstate_buf_size_from_mask(u64 mask)
 {
-	unsigned int size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
-	int i, last_feature_nr;
+	unsigned int offset;
+	int last_feature_nr;
 
 	if (!mask)
 		return 0;
@@ -169,7 +195,7 @@ unsigned int calculate_xstate_buf_size_from_mask(u64 mask)
 
 	last_feature_nr = fls64(mask) - 1;
 	if (last_feature_nr < FIRST_EXTENDED_XFEATURE)
-		return size;
+		return FXSAVE_SIZE + XSAVE_HDR_SIZE;
 
 	/*
 	 * Each state offset in the non-compacted format is fixed. Take the
@@ -182,15 +208,8 @@ unsigned int calculate_xstate_buf_size_from_mask(u64 mask)
 	 * With the given mask, no relevant size is found so far. So,
 	 * calculate it by summing up each state size.
 	 */
-	for (i = FIRST_EXTENDED_XFEATURE; i <= last_feature_nr; i++) {
-		if (!(mask & BIT_ULL(i)))
-			continue;
-
-		if (xstate_64byte_aligned[i])
-			size = ALIGN(size, 64);
-		size += xstate_sizes[i];
-	}
-	return size;
+	offset = get_xstate_comp_offset(mask, last_feature_nr);
+	return offset + xstate_sizes[last_feature_nr];
 }
 
 /*
@@ -943,19 +962,29 @@ void fpu__resume_cpu(void)
  */
 static void *__raw_xsave_addr(struct fpu *fpu, int xfeature_nr)
 {
+	unsigned int offset;
 	void *xsave;
 
 	if (!xfeature_enabled(xfeature_nr)) {
-		WARN_ON_FPU(1);
-		return NULL;
-	}
+		goto not_found;
+	} else if (!fpu) {
+		xsave = &init_fpstate.xsave;
 
-	if (fpu)
+		offset = get_xstate_comp_offset(xfeatures_mask_all, xfeature_nr);
+		if (offset > sizeof(init_fpstate))
+			goto not_found;
+	} else if (!(fpu->state_mask & BIT_ULL(xfeature_nr))) {
+		goto not_found;
+	} else {
 		xsave = &fpu->state->xsave;
-	else
-		xsave = &init_fpstate.xsave;
+		offset = get_xstate_comp_offset(fpu->state_mask, xfeature_nr);
+	}
+
+	return xsave + offset;
 
-	return xsave + xstate_comp_offsets[xfeature_nr];
+not_found:
+	WARN_ON_FPU(1);
+	return NULL;
 }
 /*
  * Given the xsave area and a state inside, this function returns the
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 12/29] x86/fpu/xstate: Update the XSTATE context copy function to support dynamic states
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (10 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 11/29] x86/fpu/xstate: Update the XSTATE buffer address finder " Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 13/29] x86/fpu/xstate: Use feature disable (XFD) to protect dynamic user state Chang S. Bae
                   ` (19 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

ptrace() and signal return paths use XSTATE context copy functions. They
allow callers to read XSTATE values in the target's buffer. With dynamic
user states, a component's position in the buffer may vary and the init
fpstate is not always large enough to cover all the states.

Introduce a new helper to adjust to find the source address correctly.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v9:
* Refactor the new code in the loop. (Borislav Petkov)
* Move out the copy_uabi_to_xstate() changes (to Patch1,9). (Borislav
  Petkov)

Changes from v5:
* Updated to ensure xstate_bv aligned with the target.
* Rewrote the xstate copy loop, for the ptrace() read path, in an open
  code.
* Adjusted the changelog.

Changes from v3:
* Cleaned up the code change with more comments.
* Removed 'no functional change' in the changelog. (Borislav Petkov)

Changes from v2:
* Updated the changelog with task->fpu removed. (Borislav Petkov)
---
 arch/x86/kernel/fpu/xstate.c | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 2cb0d8c2eeeb..34cd131f5476 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1153,6 +1153,30 @@ static void copy_feature(bool from_xstate, struct membuf *to, void *xstate,
 	membuf_write(to, from_xstate ? xstate : init_xstate, size);
 }
 
+static void copy_extended_feature(struct membuf *to, struct fpu *fpu,
+				  struct xstate_header *hdr,
+				  int feature_nr)
+{
+	unsigned int size = xstate_sizes[feature_nr];
+	u64 mask = BIT_ULL(feature_nr);
+	void *from = NULL;
+
+	/*
+	 * Copy from the XSTATE buffer if available. Otherwise, write the
+	 * init value as recorded for legacy states (FP and SSE) or as
+	 * zeros for others.
+	 */
+	if (hdr->xfeatures & mask) {
+		from = __raw_xsave_addr(fpu, feature_nr);
+		membuf_write(to, from, size);
+	} else if (XFEATURE_MASK_FPSSE & mask) {
+		from = __raw_xsave_addr(NULL, feature_nr);
+		membuf_write(to, from, size);
+	} else {
+		membuf_zero(to, size);
+	}
+}
+
 /**
  * copy_xstate_to_uabi_buf - Copy kernel saved xstate to a UABI buffer
  * @to:		membuf descriptor
@@ -1254,10 +1278,7 @@ void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
 			pkru.pkru = tsk->thread.pkru;
 			membuf_write(&to, &pkru, sizeof(pkru));
 		} else {
-			copy_feature(header.xfeatures & BIT_ULL(i), &to,
-				     __raw_xsave_addr(&tsk->thread.fpu, i),
-				     __raw_xsave_addr(NULL, i),
-				     xstate_sizes[i]);
+			copy_extended_feature(&to, &tsk->thread.fpu, &header, i);
 		}
 		/*
 		 * Keep track of the last copied state in the non-compacted
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 13/29] x86/fpu/xstate: Use feature disable (XFD) to protect dynamic user state
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (11 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 12/29] x86/fpu/xstate: Update the XSTATE context copy function " Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 14/29] x86/fpu/xstate: Support ptracer-induced XSTATE buffer expansion Chang S. Bae
                   ` (18 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

Intel's Extended Feature Disable (XFD) feature is an extension of the XSAVE
architecture. XFD allows the kernel to enable a feature state in XCR0 and
to receive a #NM trap when a task uses instructions accessing that state.
In this way, Linux can defer allocating the large XSAVE buffer until tasks
need it.

XFD introduces two MSRs: IA32_XFD to enable/disable the feature and
IA32_XFD_ERR to assist the #NM trap handler. Both use the same
xstate-component bitmap format, used by XCR0.

Use this hardware capability to find the right time to expand the XSTATE
buffer. The #NM handler induces the buffer expansion.

Introduce helper function to switch IA32_XFD MSR.

In the event of vzalloc() failure, send SIGSEGV.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v10:
* Raise SIGSEGV rather than SIGILL when XSTATE buffer reallocation fails.
  (Thiago Macieira)

Changes from v9:
* Mask the XFD flag from /proc/cpuinfo. (Borislav Petkov)
* Remove most helpers. (Borislav Petkov)
* Refactor the XFD handling code. (Borislav Petkov)
* Update the feature enumeration ordering. (Borislav Petkov)
* Rename the XFD support helper. (Borislav Petkov)
* Update the print message for dynamic states. (Borislav Petkov)
* Adjust the changelog.
* Use cpu_feature_enabled() wherever possible. (Borislav Petkov)

Changes from v7:
* Update #NM handler to raise SIGILL rather than SIGSEGV. (Thiago
  Macieira)

Changes from v6:
* Update the #NM handler a little bit.
* Clean up the code comment.

Changes from v5:
* Excluded the access request check here and included the buffer allocation
  again in #NM handler. The access request will be dealt in next patch.
* Updated the title. (Dave Hansen)
* Updated the code comment.

Changes from v4:
* Changed to use XFD to support the access request policy. Updated #NM
  handler to raise a signal instead of buffer allocation.
* Decoupled XFD from the use of XSAVE compacted format.
* Updated helper functions.
* Updated function descriptions in a proper format.
* Updated some code comments.

Changes from v3:
* Removed 'no functional change' in the changelog. (Borislav Petkov)

Changes from v2:
* Changed to enable XFD only when the compacted format is used.
* Updated the changelog with task->fpu removed. (Borislav Petkov)

Changes from v1:
* Inlined the XFD-induced #NM handling code (Andy Lutomirski)
---
 arch/x86/include/asm/cpufeatures.h  |  1 +
 arch/x86/include/asm/fpu/internal.h | 25 +++++++++++++--
 arch/x86/include/asm/msr-index.h    |  2 ++
 arch/x86/kernel/cpu/cpuid-deps.c    |  1 +
 arch/x86/kernel/fpu/xstate.c        | 46 +++++++++++++++++++++++++--
 arch/x86/kernel/process.c           | 10 ++++++
 arch/x86/kernel/process_32.c        |  2 +-
 arch/x86/kernel/process_64.c        |  2 +-
 arch/x86/kernel/traps.c             | 49 +++++++++++++++++++++++++++++
 9 files changed, 131 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d0ce5cfd3ac1..ab7b3a2de85d 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -277,6 +277,7 @@
 #define X86_FEATURE_XSAVEC		(10*32+ 1) /* XSAVEC instruction */
 #define X86_FEATURE_XGETBV1		(10*32+ 2) /* XGETBV with ECX = 1 instruction */
 #define X86_FEATURE_XSAVES		(10*32+ 3) /* XSAVES/XRSTORS instructions */
+#define X86_FEATURE_XFD			(10*32+ 4) /* "" eXtended Feature Disabling */
 
 /*
  * Extended auxiliary flags: Linux defined - for features scattered in various
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 263e349ff85a..1aa8bc75b24d 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -535,14 +535,35 @@ static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
  * Misc helper functions:
  */
 
+/**
+ * xfd_switch - Switches the MSR IA32_XFD context if needed.
+ * @prev:	The previous task's struct fpu pointer
+ * @next:	The next task's struct fpu pointer
+ */
+static inline void xfd_switch(struct fpu *prev, struct fpu *next)
+{
+	u64 prev_xfd_mask, next_xfd_mask;
+
+	if (!cpu_feature_enabled(X86_FEATURE_XFD) || !xfeatures_mask_user_dynamic)
+		return;
+
+	prev_xfd_mask = prev->state_mask & xfeatures_mask_user_dynamic;
+	next_xfd_mask = next->state_mask & xfeatures_mask_user_dynamic;
+
+	if (unlikely(prev_xfd_mask != next_xfd_mask))
+		wrmsrl_safe(MSR_IA32_XFD, xfeatures_mask_user_dynamic ^ next_xfd_mask);
+}
+
 /*
  * Delay loading of the complete FPU state until the return to userland.
  * PKRU is handled separately.
  */
-static inline void switch_fpu_finish(struct fpu *new_fpu)
+static inline void switch_fpu_finish(struct fpu *old_fpu, struct fpu *new_fpu)
 {
-	if (cpu_feature_enabled(X86_FEATURE_FPU))
+	if (cpu_feature_enabled(X86_FEATURE_FPU)) {
 		set_thread_flag(TIF_NEED_FPU_LOAD);
+		xfd_switch(old_fpu, new_fpu);
+	}
 }
 
 #endif /* _ASM_X86_FPU_INTERNAL_H */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index a7c413432b33..01e2650b9585 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -625,6 +625,8 @@
 
 #define MSR_IA32_BNDCFGS_RSVD		0x00000ffc
 
+#define MSR_IA32_XFD			0x000001c4
+#define MSR_IA32_XFD_ERR		0x000001c5
 #define MSR_IA32_XSS			0x00000da0
 
 #define MSR_IA32_APICBASE		0x0000001b
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index defda61f372d..7f891d2eb52e 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -75,6 +75,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_SGX_LC,			X86_FEATURE_SGX	      },
 	{ X86_FEATURE_SGX1,			X86_FEATURE_SGX       },
 	{ X86_FEATURE_SGX2,			X86_FEATURE_SGX1      },
+	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVE     },
 	{}
 };
 
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 34cd131f5476..a519fe143adf 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -139,6 +139,27 @@ static bool xfeature_is_supervisor(int xfeature_nr)
 	return ecx & 1;
 }
 
+/**
+ * xfeature_supports_xfd - Check if the feature supports Extended Feature
+ *			   Disable (XFD).
+ * @feature_nr:	The feature number.
+ *
+ * Returns:	True if supported; otherwise, false.
+ */
+static bool xfeature_supports_xfd(int feature_nr)
+{
+	u32 eax, ebx, ecx, edx;
+
+	if (!cpu_feature_enabled(X86_FEATURE_XFD))
+		return false;
+
+	/*
+	 * If state component 'i' supports it, ECX[2] return 1; otherwise, 0.
+	 */
+	cpuid_count(XSTATE_CPUID, feature_nr, &eax, &ebx, &ecx, &edx);
+	return ecx & 4;
+}
+
 /**
  * get_xstate_comp_offset - Find the feature offset in the compacted format.
  * @mask:	The set of components located in the compacted format
@@ -237,6 +258,9 @@ void fpu__init_cpu_xstate(void)
 		wrmsrl(MSR_IA32_XSS, xfeatures_mask_supervisor() |
 				     xfeatures_mask_independent());
 	}
+
+	if (boot_cpu_has(X86_FEATURE_XFD))
+		wrmsrl(MSR_IA32_XFD, xfeatures_mask_user_dynamic);
 }
 
 static bool xfeature_enabled(enum xfeature xfeature)
@@ -434,8 +458,9 @@ static void __init print_xstate_offset_size(void)
 	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
 		if (!xfeature_enabled(i))
 			continue;
-		pr_info("x86/fpu: xstate_offset[%d]: %4d, xstate_sizes[%d]: %4d\n",
-			 i, xstate_comp_offsets[i], i, xstate_sizes[i]);
+		pr_info("x86/fpu: xstate_offset[%d]: %4d, xstate_sizes[%d]: %4d %s\n",
+			i, xstate_comp_offsets[i], i, xstate_sizes[i],
+			(xfeatures_mask_user_dynamic & BIT_ULL(i)) ? "(dynamic)" : "");
 	}
 }
 
@@ -878,9 +903,19 @@ void __init fpu__init_system_xstate(void)
 
 	/* Store it for paranoia check at the end */
 	xfeatures = xfeatures_mask_all;
-	/* Do not support the dynamically allocated buffer yet. */
+
 	xfeatures_mask_user_dynamic = 0;
 
+	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
+		u64 feature_mask = BIT_ULL(i);
+
+		if (!(xfeatures_mask_uabi() & feature_mask))
+			continue;
+
+		if (xfeature_supports_xfd(i))
+			xfeatures_mask_user_dynamic |= feature_mask;
+	}
+
 	/* Enable xstate instructions to be able to continue with initialization: */
 	fpu__init_cpu_xstate();
 	err = init_xstate_size();
@@ -945,6 +980,11 @@ void fpu__resume_cpu(void)
 		wrmsrl(MSR_IA32_XSS, xfeatures_mask_supervisor()  |
 				     xfeatures_mask_independent());
 	}
+
+	if (cpu_feature_enabled(X86_FEATURE_XFD))
+		wrmsrl_safe(MSR_IA32_XFD, (current->thread.fpu.state_mask &
+					   xfeatures_mask_user_dynamic) ^
+					  xfeatures_mask_user_dynamic);
 }
 
 /**
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 33f5d8d07367..7471102e2bed 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -97,6 +97,16 @@ void arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
 	*size = fpu_buf_cfg.min_size;
 }
 
+void arch_release_task_struct(struct task_struct *task)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_FPU))
+		return;
+
+	/* Free up only the dynamically-allocated memory. */
+	if (task->thread.fpu.state != &task->thread.fpu.__default_state)
+		free_xstate_buffer(task->thread.fpu.state);
+}
+
 /*
  * Free thread data structures etc..
  */
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 4f2f54e1281c..7bd5d08eeb41 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -213,7 +213,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	this_cpu_write(current_task, next_p);
 
-	switch_fpu_finish(next_fpu);
+	switch_fpu_finish(prev_fpu, next_fpu);
 
 	/* Load the Intel cache allocation PQR MSR. */
 	resctrl_sched_in();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index ec0d836a13b1..41c9855158d6 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -620,7 +620,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(current_task, next_p);
 	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
 
-	switch_fpu_finish(next_fpu);
+	switch_fpu_finish(prev_fpu, next_fpu);
 
 	/* Reload sp0. */
 	update_task_stack(next_p);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index a58800973aed..08fb461fc3e5 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1108,10 +1108,59 @@ DEFINE_IDTENTRY(exc_spurious_interrupt_bug)
 	 */
 }
 
+static __always_inline bool handle_xfd_event(struct fpu *fpu)
+{
+	bool handled = false;
+	u64 xfd_err;
+
+	if (!cpu_feature_enabled(X86_FEATURE_XFD))
+		return handled;
+
+	rdmsrl_safe(MSR_IA32_XFD_ERR, &xfd_err);
+	wrmsrl_safe(MSR_IA32_XFD_ERR, 0);
+
+	if (xfd_err) {
+		u64 xfd_event = xfd_err & xfeatures_mask_user_dynamic;
+		u64 value;
+
+		if (WARN_ON(!xfd_event)) {
+			/*
+			 * Unexpected event is raised. But update XFD state to
+			 * unblock the task.
+			 */
+			rdmsrl_safe(MSR_IA32_XFD, &value);
+			wrmsrl_safe(MSR_IA32_XFD, value & ~xfd_err);
+		} else {
+			struct fpu *fpu = &current->thread.fpu;
+			int err = -1;
+
+			/*
+			 * Make sure not in interrupt context as handling a
+			 * trap from userspace.
+			 */
+			if (!WARN_ON(in_interrupt())) {
+				err = realloc_xstate_buffer(fpu, xfd_event);
+				if (!err)
+					wrmsrl_safe(MSR_IA32_XFD, (fpu->state_mask &
+								   xfeatures_mask_user_dynamic) ^
+								  xfeatures_mask_user_dynamic);
+			}
+
+			if (err)
+				force_sig(SIGSEGV);
+		}
+		handled = true;
+	}
+	return handled;
+}
+
 DEFINE_IDTENTRY(exc_device_not_available)
 {
 	unsigned long cr0 = read_cr0();
 
+	if (handle_xfd_event(&current->thread.fpu))
+		return;
+
 #ifdef CONFIG_MATH_EMULATION
 	if (!boot_cpu_has(X86_FEATURE_FPU) && (cr0 & X86_CR0_EM)) {
 		struct math_emu_info info = { };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 14/29] x86/fpu/xstate: Support ptracer-induced XSTATE buffer expansion
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (12 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 13/29] x86/fpu/xstate: Use feature disable (XFD) to protect dynamic user state Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 15/29] x86/arch_prctl: Create ARCH_SET_STATE_ENABLE/ARCH_GET_STATE_ENABLE Chang S. Bae
                   ` (17 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

ptrace() may update XSTATE data before the target task has taken an XFD
fault and expanded the XSTATE buffer. Detect this case and allocate a
sufficient buffer to support the request. Also, disable the (now
unnecessary) associated first-use fault.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v9:
* Simplify the code further. (Borislav Petkov)

Changes from v5:
* Adjusted to use 'tmpbuf' for the new base code.

Changes from v4:
* Improved the condition check for the expansion.
* Simplified the XSTATE_BV retrieval.
* Updated the code comment.

Changes from v3:
* Removed 'no functional changes' in the changelog. (Borislav Petkov)

Changes from v2:
* Updated the changelog with task->fpu removed. (Borislav Petkov)
* Updated the code comments.
---
 arch/x86/kernel/fpu/regset.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 7ea10f98c2b0..c57ad37a95fe 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -163,6 +163,27 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
 		}
 	}
 
+	/*
+	 * When a ptracer attempts to write any dynamic user state in the
+	 * target buffer but not sufficiently allocated, it dynamically
+	 * expands the buffer.
+	 */
+	if (xfeatures_mask_user_dynamic) {
+		u64 state_mask;
+
+		/* Retrieve XSTATE_BV. */
+		memcpy(&state_mask, (kbuf ?: tmpbuf) + offsetof(struct xregs_state, header),
+		       sizeof(u64));
+
+		/* Expand the xstate buffer based on the XSTATE_BV. */
+		state_mask &= xfeatures_mask_user_dynamic;
+		if (state_mask) {
+			ret = realloc_xstate_buffer(fpu, state_mask);
+			if (ret)
+				goto out;
+		}
+	}
+
 	fpu_force_restore(fpu);
 	ret = copy_uabi_from_kernel_to_xstate(fpu, kbuf ?: tmpbuf);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 15/29] x86/arch_prctl: Create ARCH_SET_STATE_ENABLE/ARCH_GET_STATE_ENABLE
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (13 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 14/29] x86/fpu/xstate: Support ptracer-induced XSTATE buffer expansion Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-05  0:30   ` Thomas Gleixner
  2021-10-01 22:37 ` [PATCH v11 16/29] x86/fpu/xstate: Support both legacy and expanded signal XSTATE size Chang S. Bae
                   ` (16 subsequent siblings)
  31 siblings, 1 reply; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

arch_prctl(ARCH_SET_STATE_ENABLE, u64 bitmask)
    Some XSTATE features, such as AMX, are unavailable to applications
    until that process explicitly requests them via this call. Requests can
    be made for any number of valid user XSTATEs in a single call. This
    call is intended to be invoked very early in process initialization. A
    forked child inherits access, but permission is reset upon exec. There
    is no concept of un-requesting XSTATE access.
    Return codes:
        0: success (including repeated calls)
        EINVAL: no hardware feature for the request

arch_prctl(ARCH_GET_STATE_ENABLE, u64 *bitmask)
    Return the bitmask of permitted user XSTATE features. If XSAVE is
    disabled, the bitmask indicates only legacy states.

The permission is checked at every XSTATE buffer expansion: e.g.
XFD-induced #NM event, and ptracer's XSTATE injection. When no permission
is found, inform userspace via SIGILL or with error code.

For "dynamic" XSTATE features that have XFD hardware support, the kernel
can enforce that users can not touch state without permission. For state
that has no XFD support, the kernel can not prevent a user from touching
that state.

The notion of granted permission is recorded in the group leader only. A
new task copies its permission bitmask.

Rename the third argument for do_arch_prctl_common() to reflect its generic
use.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v10:
* Expand permission-required states rather than just XFD-protected states.
* Simplify syscall implementation - no functional change.
* Fix the changelog and the comment for ARCH_SET_STATE_ENABLE.

Changes from v9:
* Simplify and improve the implementation.
* Use cpu_feature_enabled() instead of boot_cpu_has(). (Borislav Petkov)

Changes from v8:
* Update arch_prctl prototype for consistency with other arch_prctl's. It
  now takes an address of return bitmask as a parameter.
* Optimize the reset function.

Changes from v7:
* Rename the syscalls. (Thiago Macieira and Dave Hansen)
* If XSAVE is disabled, assure that syscall correctly indicates legacy
  states. (Thiago Macieira and Dave Hansen)

Changes from v6:
* Add state bitmap param to proposed syscall. (Thiago Macieira)
* Add companion syscall to return the current permission bitmap.
* Update the ptrace path to return EFAULT when no permission to write
  XTILEDATA.
* Update do_arch_prctl_common().

Changes from v5:
* Switched to per-process permission. (Based on the discussion on LKML)
---
 arch/x86/include/asm/fpu/types.h  | 11 +++++
 arch/x86/include/asm/fpu/xstate.h | 35 +++++++++++++
 arch/x86/include/asm/proto.h      |  2 +-
 arch/x86/include/uapi/asm/prctl.h |  3 ++
 arch/x86/kernel/fpu/core.c        |  7 +--
 arch/x86/kernel/fpu/regset.c      | 17 ++++---
 arch/x86/kernel/fpu/xstate.c      | 81 ++++++++++++++++++++++++++++++-
 arch/x86/kernel/process.c         |  7 ++-
 arch/x86/kernel/process_64.c      |  4 ++
 arch/x86/kernel/traps.c           | 35 ++++++++-----
 10 files changed, 175 insertions(+), 27 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 0cc9f6c5a10c..617184b0afec 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -336,6 +336,17 @@ struct fpu {
 	 */
 	unsigned long			avx512_timestamp;
 
+	/*
+	 * @state_perm:
+	 *
+	 * This bitmap indicates the permission for state components.
+	 *
+	 * Always reference group_leader's value via
+	 * get_group_state_perm() as it readily represents the process's
+	 * state permission.
+	 */
+	u64				state_perm;
+
 	/*
 	 * @state_mask:
 	 *
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 9574ee20c6aa..1f62a38e4ae1 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -8,6 +8,7 @@
 #include <asm/processor.h>
 #include <asm/fpu/api.h>
 #include <asm/user.h>
+#include <asm/prctl.h>
 
 /* Bit 63 of XCR0 is reserved for future expansion */
 #define XFEATURE_MASK_EXTEND	(~(XFEATURE_MASK_FPSSE | (1ULL << 63)))
@@ -35,6 +36,9 @@
 				      XFEATURE_MASK_BNDREGS | \
 				      XFEATURE_MASK_BNDCSR)
 
+/* Require ARCH_SET_STATE_ENABLE for future features  */
+#define XFEATURE_MASK_PERMISSION_REQUIRED GENMASK_ULL(63, XFEATURE_MAX)
+
 /*
  * Features which are restored when returning to user space.
  * PKRU is not restored on return to user space because PKRU
@@ -100,6 +104,11 @@ static inline u64 xfeatures_mask_uabi(void)
 	return xfeatures_mask_all & XFEATURE_MASK_USER_SUPPORTED;
 }
 
+static inline u64 xfeatures_mask_user_perm(void)
+{
+	return xfeatures_mask_uabi() & XFEATURE_MASK_PERMISSION_REQUIRED;
+}
+
 /*
  * The xfeatures which are restored by the kernel when returning to user
  * mode. This is not necessarily the same as xfeatures_mask_uabi() as the
@@ -157,6 +166,32 @@ unsigned int calculate_xstate_buf_size_from_mask(u64 mask);
 void *get_xsave_addr(struct fpu *fpu, int xfeature_nr);
 int realloc_xstate_buffer(struct fpu *fpu, u64 mask);
 void free_xstate_buffer(union fpregs_state *state);
+
+/**
+ * get_group_state_perm - Get a per-process state permission
+ * @tsk:	A struct task_struct * pointer
+ * Return:	A bitmap to indicate state permission.
+ */
+static inline u64 get_group_state_perm(struct task_struct *tsk)
+{
+	return tsk->group_leader->thread.fpu.state_perm;
+}
+
+/**
+ * state_permitted - Check a task's permission for indicated features.
+ * @tsk:	A struct task_struct * pointer
+ * @state_mask:	A bitmap of queried features
+ * Return:	True if all of the queried features are permitted;
+ *		otherwise, false.
+ */
+static inline bool state_permitted(struct task_struct *tsk, u64 state_mask)
+{
+	return ((state_mask & get_group_state_perm(tsk)) == state_mask);
+}
+
+void reset_state_perm(struct task_struct *tsk);
+long do_arch_prctl_state(struct task_struct *tsk, int option, unsigned long arg2);
+
 int xfeature_size(int xfeature_nr);
 int copy_uabi_from_kernel_to_xstate(struct fpu *fpu, const void *kbuf);
 int copy_sigframe_from_user_to_xstate(struct fpu *fpu, const void __user *ubuf);
diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index 8c5d1910a848..feed36d44d04 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -40,6 +40,6 @@ void x86_report_nx(void);
 extern int reboot_force;
 
 long do_arch_prctl_common(struct task_struct *task, int option,
-			  unsigned long cpuid_enabled);
+			  unsigned long arg2);
 
 #endif /* _ASM_X86_PROTO_H */
diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index 5a6aac9fa41f..c73e141ce90a 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -10,6 +10,9 @@
 #define ARCH_GET_CPUID		0x1011
 #define ARCH_SET_CPUID		0x1012
 
+#define ARCH_SET_STATE_ENABLE	0x1021
+#define ARCH_GET_STATE_ENABLE	0x1022
+
 #define ARCH_MAP_VDSO_X32	0x2001
 #define ARCH_MAP_VDSO_32	0x2002
 #define ARCH_MAP_VDSO_64	0x2003
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 164e75c37dbb..b722360d2a85 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -274,10 +274,11 @@ int fpu_clone(struct task_struct *dst)
 		return 0;
 
 	/*
-	 * The child does not inherit the dynamic states. Thus, use the
-	 * buffer embedded in struct task_struct, which has the minimum
-	 * size.
+	 * The child does not inherit the dynamic states but permission.
+	 * Use the buffer embedded in struct task_struct, which has the
+	 * minimum size.
 	 */
+	dst_fpu->state_perm = get_group_state_perm(current);
 	dst_fpu->state_mask = (xfeatures_mask_all & ~xfeatures_mask_user_dynamic);
 	dst_fpu->state = &dst_fpu->__default_state;
 	/*
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index c57ad37a95fe..14bfbf380015 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -166,7 +166,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
 	/*
 	 * When a ptracer attempts to write any dynamic user state in the
 	 * target buffer but not sufficiently allocated, it dynamically
-	 * expands the buffer.
+	 * expands the buffer if permitted.
 	 */
 	if (xfeatures_mask_user_dynamic) {
 		u64 state_mask;
@@ -175,13 +175,16 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
 		memcpy(&state_mask, (kbuf ?: tmpbuf) + offsetof(struct xregs_state, header),
 		       sizeof(u64));
 
-		/* Expand the xstate buffer based on the XSTATE_BV. */
-		state_mask &= xfeatures_mask_user_dynamic;
-		if (state_mask) {
-			ret = realloc_xstate_buffer(fpu, state_mask);
-			if (ret)
-				goto out;
+		/* Check the permission. */
+		if (!state_permitted(target, state_mask)) {
+			ret = -EFAULT;
+			goto out;
 		}
+
+		/* Expand the xstate buffer based on the XSTATE_BV. */
+		ret = realloc_xstate_buffer(fpu, state_mask);
+		if (ret)
+			goto out;
 	}
 
 	fpu_force_restore(fpu);
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index a519fe143adf..38c003f840b4 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -912,7 +912,8 @@ void __init fpu__init_system_xstate(void)
 		if (!(xfeatures_mask_uabi() & feature_mask))
 			continue;
 
-		if (xfeature_supports_xfd(i))
+		/* The kernel needs to mandate the dynamic state. */
+		if (xfeature_supports_xfd(i) && (feature_mask & xfeatures_mask_user_perm()))
 			xfeatures_mask_user_dynamic |= feature_mask;
 	}
 
@@ -927,6 +928,7 @@ void __init fpu__init_system_xstate(void)
 	 * dynamic states.
 	 */
 	current->thread.fpu.state_mask = (xfeatures_mask_all & ~xfeatures_mask_user_dynamic);
+	current->thread.fpu.state_perm = (xfeatures_mask_all & ~xfeatures_mask_user_perm());
 
 	/*
 	 * Update info used for ptrace frames; use standard-format size and no
@@ -1187,6 +1189,83 @@ int realloc_xstate_buffer(struct fpu *fpu, u64 mask)
 	return 0;
 }
 
+/**
+ * reset_state_perm - Reset a task's permission for dynamic user state
+ *
+ * It is expected to call at exec in which one task runs in a process.
+ *
+ * @task:	A struct task_struct * pointer
+ */
+void reset_state_perm(struct task_struct *tsk)
+{
+	struct fpu *fpu = &tsk->thread.fpu;
+
+	fpu->state_perm = xfeatures_mask_all & ~xfeatures_mask_user_perm();
+
+	if (!xfeatures_mask_user_dynamic ||
+	    !(fpu->state_mask & xfeatures_mask_user_dynamic))
+		return;
+
+	WARN_ON(tsk->signal->nr_threads > 1);
+
+	fpu->state_mask = (xfeatures_mask_all & ~xfeatures_mask_user_dynamic);
+	free_xstate_buffer(fpu->state);
+	fpu->state = &fpu->__default_state;
+	if (cpu_feature_enabled(X86_FEATURE_XSAVES))
+		fpstate_init_xstate(&fpu->state->xsave, fpu->state_mask);
+
+	wrmsrl_safe(MSR_IA32_XFD,
+		    (fpu->state_mask & xfeatures_mask_user_dynamic) ^
+		    xfeatures_mask_user_dynamic);
+}
+
+/**
+ * do_arch_prctl_state - Read or write the state permission.
+ * @fpu:	A struct task_struct * pointer
+ * @option:	A subfunction of arch_prctl()
+ * @arg2:	Either a pointer to userspace memory or state-component
+ *		bitmap value.
+ * Return:	0 if successful; otherwise, return a relevant error code.
+ */
+long do_arch_prctl_state(struct task_struct *tsk, int option, unsigned long arg2)
+{
+	u64 features_mask;
+
+	if (!cpu_feature_enabled(X86_FEATURE_FPU))
+		features_mask = 0;
+	else if (use_fxsr())
+		features_mask = XFEATURE_MASK_FPSSE;
+	else
+		features_mask = XFEATURE_MASK_FP;
+
+	switch (option) {
+	case ARCH_SET_STATE_ENABLE: {
+		u64 state_perm = arg2;
+
+		if (use_xsave())
+			features_mask = xfeatures_mask_uabi();
+
+		if (state_perm & ~features_mask)
+			return -EINVAL;
+
+		state_perm &= xfeatures_mask_user_perm();
+		if (!state_perm)
+			return 0;
+
+		tsk->group_leader->thread.fpu.state_perm |= state_perm;
+		return 0;
+	}
+	case ARCH_GET_STATE_ENABLE: {
+		if (use_xsave())
+			features_mask = get_group_state_perm(tsk);
+
+		return put_user(features_mask, (unsigned long __user *)arg2);
+	}
+	default:
+		return -EINVAL;
+	}
+}
+
 static void copy_feature(bool from_xstate, struct membuf *to, void *xstate,
 			 void *init_xstate, unsigned int size)
 {
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 7471102e2bed..b43e2b0f52f2 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -1016,13 +1016,16 @@ unsigned long get_wchan(struct task_struct *p)
 }
 
 long do_arch_prctl_common(struct task_struct *task, int option,
-			  unsigned long cpuid_enabled)
+			  unsigned long arg2)
 {
 	switch (option) {
 	case ARCH_GET_CPUID:
 		return get_cpuid_mode();
 	case ARCH_SET_CPUID:
-		return set_cpuid_mode(task, cpuid_enabled);
+		return set_cpuid_mode(task, arg2);
+	case ARCH_SET_STATE_ENABLE:
+	case ARCH_GET_STATE_ENABLE:
+		return do_arch_prctl_state(task, option, arg2);
 	}
 
 	return -EINVAL;
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 41c9855158d6..7aceff54a818 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -678,6 +678,8 @@ void set_personality_64bit(void)
 	   so it's not too bad. The main problem is just that
 	   32bit children are affected again. */
 	current->personality &= ~READ_IMPLIES_EXEC;
+
+	reset_state_perm(current);
 }
 
 static void __set_personality_x32(void)
@@ -723,6 +725,8 @@ void set_personality_ia32(bool x32)
 	/* Make sure to be in 32bit mode */
 	set_thread_flag(TIF_ADDR32);
 
+	reset_state_perm(current);
+
 	if (x32)
 		__set_personality_x32();
 	else
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 08fb461fc3e5..bbf30e73d156 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1108,7 +1108,7 @@ DEFINE_IDTENTRY(exc_spurious_interrupt_bug)
 	 */
 }
 
-static __always_inline bool handle_xfd_event(struct fpu *fpu)
+static __always_inline bool handle_xfd_event(struct fpu *fpu, struct pt_regs *regs)
 {
 	bool handled = false;
 	u64 xfd_err;
@@ -1135,19 +1135,28 @@ static __always_inline bool handle_xfd_event(struct fpu *fpu)
 			int err = -1;
 
 			/*
-			 * Make sure not in interrupt context as handling a
-			 * trap from userspace.
+			 * Make sure that dynamic buffer expansion is permitted
+			 * and not in interrupt context as handling a trap from
+			 * userspace.
+			 *
+			 * Raise SIGILL with insufficient permission and SIGSEGV
+			 * with the buffer allocation failure.
 			 */
-			if (!WARN_ON(in_interrupt())) {
-				err = realloc_xstate_buffer(fpu, xfd_event);
-				if (!err)
-					wrmsrl_safe(MSR_IA32_XFD, (fpu->state_mask &
-								   xfeatures_mask_user_dynamic) ^
-								  xfeatures_mask_user_dynamic);
+			if (!state_permitted(current, xfd_event)) {
+				force_sig_fault(SIGILL, ILL_ILLOPC, error_get_trap_addr(regs));
+			} else {
+				if (!WARN_ON(in_interrupt())) {
+					err = realloc_xstate_buffer(fpu, xfd_event);
+					if (!err)
+						wrmsrl_safe(MSR_IA32_XFD,
+							    (fpu->state_mask &
+							     xfeatures_mask_user_dynamic) ^
+							    xfeatures_mask_user_dynamic);
+				}
+
+				if (err)
+					force_sig(SIGSEGV);
 			}
-
-			if (err)
-				force_sig(SIGSEGV);
 		}
 		handled = true;
 	}
@@ -1158,7 +1167,7 @@ DEFINE_IDTENTRY(exc_device_not_available)
 {
 	unsigned long cr0 = read_cr0();
 
-	if (handle_xfd_event(&current->thread.fpu))
+	if (handle_xfd_event(&current->thread.fpu, regs))
 		return;
 
 #ifdef CONFIG_MATH_EMULATION
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 16/29] x86/fpu/xstate: Support both legacy and expanded signal XSTATE size
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (14 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 15/29] x86/arch_prctl: Create ARCH_SET_STATE_ENABLE/ARCH_GET_STATE_ENABLE Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-05 12:30   ` Thomas Gleixner
  2021-10-05 15:19   ` Thomas Gleixner
  2021-10-01 22:37 ` [PATCH v11 17/29] x86/fpu/xstate: Adjust the XSAVE feature table to address gaps in state component numbers Chang S. Bae
                   ` (15 subsequent siblings)
  31 siblings, 2 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

Prepare to support two XSTATE sizes on the signal stack -- legacy and
expanded. Legacy programs have not requested access to the
permission-required features, and the XSTATE on their signal stack can
include up through the PKRU state.

Programs that request access to opt-in features will have an uncompacted
XSTATE that includes those features. If such program that also use the
sigaltstack, they must assure that their sigaltstack is large enough to
handle that full XSTATE format. (This is most easily done by using signal.h
from glibc 2.34 or later) If the sigaltstack(2) precedes the permission
system call and uses an insufficient size, the kernel will not grant
permission to access the feature.

Introduce a new XSTATE size variable for the legacy stack and some helpers.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v10:
* Expand permission-required states rather than just XFD-protected state.
* Add sanity checks to sigaltstack() and ARCH_SET_STATE_ENABLE syscalls
  (Thomas Gleixner)
* Fix to access fpu->state_mask via the helper.

Changes from v9:
* Use get_group_state_perm() to check the permission.

Changes from v6:
* Massage the code comments.

Changes form v5:
* Added as a new patch.
---
 arch/x86/include/asm/fpu/internal.h | 25 ++++++++++--
 arch/x86/include/asm/fpu/xstate.h   | 23 ++++++++++-
 arch/x86/kernel/fpu/init.c          |  1 +
 arch/x86/kernel/fpu/signal.c        | 63 ++++++++++++++++++++---------
 arch/x86/kernel/fpu/xstate.c        | 29 +++++++++++++
 kernel/signal.c                     |  8 ++++
 6 files changed, 124 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 1aa8bc75b24d..06be4c247c97 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -337,15 +337,32 @@ static inline void os_xrstor(struct xregs_state *xstate, u64 mask)
  */
 static inline int xsave_to_user_sigframe(struct xregs_state __user *buf)
 {
+	u32 lmask, hmask;
+	u64 mask;
+	int err;
+
 	/*
 	 * Include the features which are not xsaved/rstored by the kernel
 	 * internally, e.g. PKRU. That's user space ABI and also required
 	 * to allow the signal handler to modify PKRU.
 	 */
-	u64 mask = xfeatures_mask_uabi();
-	u32 lmask = mask;
-	u32 hmask = mask >> 32;
-	int err;
+	mask = xfeatures_mask_uabi();
+
+	/* Include the permission-required states only when used. */
+	if (xfeatures_mask_user_perm()) {
+		u64 uabi_mask = mask;
+
+		mask = uabi_mask & ~xfeatures_mask_user_perm();
+
+		if (sig_xstate_expanded(current)) {
+			u64 cur_uabi_mask = uabi_mask & current->thread.fpu.state_mask;
+
+			mask |= cur_uabi_mask & xfeatures_mask_user_perm();
+		}
+	}
+
+	lmask = mask;
+	hmask = mask >> 32;
 
 	/*
 	 * Clear the xsave header first, so that reserved fields are
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 1f62a38e4ae1..e0437dfd897b 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -9,6 +9,7 @@
 #include <asm/fpu/api.h>
 #include <asm/user.h>
 #include <asm/prctl.h>
+#include <asm/elf.h>
 
 /* Bit 63 of XCR0 is reserved for future expansion */
 #define XFEATURE_MASK_EXTEND	(~(XFEATURE_MASK_FPSSE | (1ULL << 63)))
@@ -154,10 +155,13 @@ extern void __init update_regset_xstate_info(unsigned int size,
  *				contains all the enabled state components.
  * @user_size:			The size of user-space buffer for signal and
  *				ptrace frames, in the non-compacted format.
+ * @user_minsig_size:		The non-compacted legacy xstate size for signal.
+ *				Legacy programs do not request to access dynamic
+ *				states.
  */
 struct fpu_xstate_buffer_config {
 	unsigned int min_size, max_size;
-	unsigned int user_size;
+	unsigned int user_size, user_minsig_size;
 };
 
 extern struct fpu_xstate_buffer_config fpu_buf_cfg;
@@ -189,6 +193,23 @@ static inline bool state_permitted(struct task_struct *tsk, u64 state_mask)
 	return ((state_mask & get_group_state_perm(tsk)) == state_mask);
 }
 
+/**
+ * sig_xstate_expanded - Check if a task's xstate in sigframe is expanded.
+ *
+ * If a user thread is successful with ARCH_SET_STATE_ENABLE, its sigstack
+ * has to be at least AT_MINSIGSTKSZ. The sigframe's xstate size can
+ * include all the feature state.
+ *
+ * @tsk:	A struct task_struct * pointer
+ * Return:	True if the xstate is expanded; otherwise, false.
+ */
+static inline bool sig_xstate_expanded(struct task_struct *tsk)
+{
+	return ((get_group_state_perm(tsk) & xfeatures_mask_user_perm()) > 0);
+}
+
+extern bool arch_enough_sigaltstack(struct task_struct *tsk, size_t ss_size);
+
 void reset_state_perm(struct task_struct *tsk);
 long do_arch_prctl_state(struct task_struct *tsk, int option, unsigned long arg2);
 
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index cd1f3114f3ca..75bacda2ab87 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -210,6 +210,7 @@ static void __init fpu__init_system_xstate_size_legacy(void)
 	fpu_buf_cfg.min_size = xstate_size;
 	fpu_buf_cfg.max_size = xstate_size;
 	fpu_buf_cfg.user_size = xstate_size;
+	fpu_buf_cfg.user_minsig_size = xstate_size;
 }
 
 /* Legacy code to initialize eager fpu mode. */
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index fe2732db6d6b..3025346d8168 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -15,9 +15,25 @@
 #include <asm/sigframe.h>
 #include <asm/trace/fpu.h>
 
+/*
+ * Record the signal xstate size and feature bits. Exclude dynamic user
+ * states. See fpu__init_prepare_fx_sw_frame(). The opt-in tasks will
+ * dynamically adjust the data.
+ */
 static struct _fpx_sw_bytes fx_sw_reserved __ro_after_init;
 static struct _fpx_sw_bytes fx_sw_reserved_ia32 __ro_after_init;
 
+static unsigned int current_sig_xstate_size(void)
+{
+	return sig_xstate_expanded(current) ?
+	       fpu_buf_cfg.user_size : fpu_buf_cfg.user_minsig_size;
+}
+
+static inline int extend_sig_xstate_size(unsigned int size)
+{
+	return use_xsave() ? size + FP_XSTATE_MAGIC2_SIZE : size;
+}
+
 /*
  * Check for the presence of extended state information in the
  * user fpstate pointer in the sigcontext.
@@ -36,7 +52,7 @@ static inline int check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
 	/* Check for the first magic field and other error scenarios. */
 	if (fx_sw->magic1 != FP_XSTATE_MAGIC1 ||
 	    fx_sw->xstate_size < min_xstate_size ||
-	    fx_sw->xstate_size > fpu_buf_cfg.user_size ||
+	    fx_sw->xstate_size > current_sig_xstate_size() ||
 	    fx_sw->xstate_size > fx_sw->extended_size)
 		goto setfx;
 
@@ -94,20 +110,32 @@ static inline int save_fsave_header(struct task_struct *tsk, void __user *buf)
 
 static inline int save_xstate_epilog(void __user *buf, int ia32_frame)
 {
+	unsigned int current_xstate_size = current_sig_xstate_size();
 	struct xregs_state __user *x = buf;
-	struct _fpx_sw_bytes *sw_bytes;
+	struct _fpx_sw_bytes sw_bytes;
 	u32 xfeatures;
 	int err;
 
-	/* Setup the bytes not touched by the [f]xsave and reserved for SW. */
-	sw_bytes = ia32_frame ? &fx_sw_reserved_ia32 : &fx_sw_reserved;
-	err = __copy_to_user(&x->i387.sw_reserved, sw_bytes, sizeof(*sw_bytes));
+	/*
+	 * Setup the bytes not touched by the [f]xsave and reserved for SW.
+	 *
+	 * Use the recorded values if it matches with the current task. Otherwise,
+	 * adjust it.
+	 */
+	sw_bytes = ia32_frame ? fx_sw_reserved_ia32 : fx_sw_reserved;
+	if (sw_bytes.xstate_size != current_xstate_size) {
+		unsigned int default_xstate_size = sw_bytes.xstate_size;
+
+		sw_bytes.xfeatures = xfeatures_mask_uabi();
+		sw_bytes.xstate_size = current_xstate_size;
+		sw_bytes.extended_size += (current_xstate_size - default_xstate_size);
+	}
+	err = __copy_to_user(&x->i387.sw_reserved, &sw_bytes, sizeof(sw_bytes));
 
 	if (!use_xsave())
 		return err;
 
-	err |= __put_user(FP_XSTATE_MAGIC2,
-			  (__u32 __user *)(buf + fpu_buf_cfg.user_size));
+	err |= __put_user(FP_XSTATE_MAGIC2, (__u32 __user *)(buf + current_xstate_size));
 
 	/*
 	 * Read the xfeatures which we copied (directly from the cpu or
@@ -144,7 +172,7 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
 	else
 		err = fnsave_to_user_sigframe((struct fregs_state __user *) buf);
 
-	if (unlikely(err) && __clear_user(buf, fpu_buf_cfg.user_size))
+	if (unlikely(err) && __clear_user(buf, current_sig_xstate_size()))
 		err = -EFAULT;
 	return err;
 }
@@ -205,7 +233,7 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 	fpregs_unlock();
 
 	if (ret) {
-		if (!fault_in_pages_writeable(buf_fx, fpu_buf_cfg.user_size))
+		if (!fault_in_pages_writeable(buf_fx, current_sig_xstate_size()))
 			goto retry;
 		return -EFAULT;
 	}
@@ -418,18 +446,13 @@ static int __fpu_restore_sig(void __user *buf, void __user *buf_fx,
 	fpregs_unlock();
 	return ret;
 }
-static inline int xstate_sigframe_size(void)
-{
-	return use_xsave() ? fpu_buf_cfg.user_size + FP_XSTATE_MAGIC2_SIZE :
-			fpu_buf_cfg.user_size;
-}
 
 /*
  * Restore FPU state from a sigframe:
  */
 int fpu__restore_sig(void __user *buf, int ia32_frame)
 {
-	unsigned int size = xstate_sigframe_size();
+	unsigned int size = extend_sig_xstate_size(current_sig_xstate_size());
 	struct fpu *fpu = &current->thread.fpu;
 	void __user *buf_fx = buf;
 	bool ia32_fxstate = false;
@@ -476,7 +499,7 @@ unsigned long
 fpu__alloc_mathframe(unsigned long sp, int ia32_frame,
 		     unsigned long *buf_fx, unsigned long *size)
 {
-	unsigned long frame_size = xstate_sigframe_size();
+	unsigned long frame_size = extend_sig_xstate_size(current_sig_xstate_size());
 
 	*buf_fx = sp = round_down(sp - frame_size, 64);
 	if (ia32_frame && use_fxsr()) {
@@ -491,7 +514,7 @@ fpu__alloc_mathframe(unsigned long sp, int ia32_frame,
 
 unsigned long fpu__get_fpstate_size(void)
 {
-	unsigned long ret = xstate_sigframe_size();
+	unsigned long ret = extend_sig_xstate_size(fpu_buf_cfg.user_size);
 
 	/*
 	 * This space is needed on (most) 32-bit kernels, or when a 32-bit
@@ -516,12 +539,12 @@ unsigned long fpu__get_fpstate_size(void)
  */
 void fpu__init_prepare_fx_sw_frame(void)
 {
-	int ext_size = fpu_buf_cfg.user_size + FP_XSTATE_MAGIC2_SIZE;
-	int xstate_size = fpu_buf_cfg.user_size;
+	int ext_size = fpu_buf_cfg.user_minsig_size + FP_XSTATE_MAGIC2_SIZE;
+	int xstate_size = fpu_buf_cfg.user_minsig_size;
 
 	fx_sw_reserved.magic1 = FP_XSTATE_MAGIC1;
 	fx_sw_reserved.extended_size = ext_size;
-	fx_sw_reserved.xfeatures = xfeatures_mask_uabi();
+	fx_sw_reserved.xfeatures = xfeatures_mask_uabi() & ~xfeatures_mask_user_perm();
 	fx_sw_reserved.xstate_size = xstate_size;
 
 	if (IS_ENABLED(CONFIG_IA32_EMULATION) ||
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 38c003f840b4..5905857fbbc0 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -821,6 +821,21 @@ static int __init init_xstate_size(void)
 	 * User space is always in standard format.
 	 */
 	fpu_buf_cfg.user_size = xsave_size;
+
+	/*
+	 * The minimum signal xstate size is for legacy user threads
+	 * that do not access dynamic states.
+	 */
+	if (xfeatures_mask_user_perm()) {
+		int nr = fls64(xfeatures_mask_uabi() & ~xfeatures_mask_user_perm()) - 1;
+		unsigned int size, offset, ecx, edx;
+
+		cpuid_count(XSTATE_CPUID, nr, &size, &offset, &ecx, &edx);
+		fpu_buf_cfg.user_minsig_size = offset + size;
+	} else {
+		fpu_buf_cfg.user_minsig_size = xsave_size;
+	}
+
 	return 0;
 }
 
@@ -1252,6 +1267,13 @@ long do_arch_prctl_state(struct task_struct *tsk, int option, unsigned long arg2
 		if (!state_perm)
 			return 0;
 
+		/*
+		 * Disallow when sigaltstack is not enough for the
+		 * AT_MINSIGSTKSZ value.
+		 */
+		if (tsk->sas_ss_size > 0 && tsk->sas_ss_size < get_sigframe_size())
+			return -EPERM;
+
 		tsk->group_leader->thread.fpu.state_perm |= state_perm;
 		return 0;
 	}
@@ -1266,6 +1288,13 @@ long do_arch_prctl_state(struct task_struct *tsk, int option, unsigned long arg2
 	}
 }
 
+bool arch_enough_sigaltstack(struct task_struct *tsk, size_t ss_size)
+{
+	if (sig_xstate_expanded(tsk))
+		return ss_size >= get_sigframe_size();
+	return true;
+}
+
 static void copy_feature(bool from_xstate, struct membuf *to, void *xstate,
 			 void *init_xstate, unsigned int size)
 {
diff --git a/kernel/signal.c b/kernel/signal.c
index 952741f6d0f9..9a516b6e795e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -4151,6 +4151,11 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact)
 	return 0;
 }
 
+bool __weak arch_enough_sigaltstack(struct task_struct *tsk, size_t ss_size)
+{
+	return true;
+}
+
 static int
 do_sigaltstack (const stack_t *ss, stack_t *oss, unsigned long sp,
 		size_t min_ss_size)
@@ -4187,6 +4192,9 @@ do_sigaltstack (const stack_t *ss, stack_t *oss, unsigned long sp,
 				return -ENOMEM;
 		}
 
+		if (!arch_enough_sigaltstack(t, ss_size))
+			return -EINVAL;
+
 		t->sas_ss_sp = (unsigned long) ss_sp;
 		t->sas_ss_size = ss_size;
 		t->sas_ss_flags = ss_flags;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 17/29] x86/fpu/xstate: Adjust the XSAVE feature table to address gaps in state component numbers
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (15 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 16/29] x86/fpu/xstate: Support both legacy and expanded signal XSTATE size Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 18/29] x86/fpu/xstate: Disable XSTATE support if an inconsistent state is detected Chang S. Bae
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

At compile-time xfeatures_mask_all includes all possible XCR0 features. At
run-time fpu__init_system_xstate() clears features in xfeatures_mask_all
that are not enabled in CPUID. It does this by looping through all possible
XCR0 features.

Update the code to handle the possibility that there will be gaps in the
XCR0 feature bit numbers.

No functional change.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v5:
* Folded a few lines.

Changes from v4:
* Simplified the implementation. (Thomas Gleixner)
* Updated the patch title accordingly.

Changes from v1:
* Rebased on the upstream kernel (5.10)
---
 arch/x86/kernel/fpu/xstate.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 5905857fbbc0..b6bf32cb650d 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -43,18 +43,17 @@ static const char *xfeature_names[] =
 	"unknown xstate feature"	,
 };
 
-static short xsave_cpuid_features[] __initdata = {
-	X86_FEATURE_FPU,
-	X86_FEATURE_XMM,
-	X86_FEATURE_AVX,
-	X86_FEATURE_MPX,
-	X86_FEATURE_MPX,
-	X86_FEATURE_AVX512F,
-	X86_FEATURE_AVX512F,
-	X86_FEATURE_AVX512F,
-	X86_FEATURE_INTEL_PT,
-	X86_FEATURE_PKU,
-	X86_FEATURE_ENQCMD,
+static unsigned short xsave_cpuid_features[] __initdata = {
+	[XFEATURE_SSE]				= X86_FEATURE_XMM,
+	[XFEATURE_YMM]				= X86_FEATURE_AVX,
+	[XFEATURE_BNDREGS]			= X86_FEATURE_MPX,
+	[XFEATURE_BNDCSR]			= X86_FEATURE_MPX,
+	[XFEATURE_OPMASK]			= X86_FEATURE_AVX512F,
+	[XFEATURE_ZMM_Hi256]			= X86_FEATURE_AVX512F,
+	[XFEATURE_Hi16_ZMM]			= X86_FEATURE_AVX512F,
+	[XFEATURE_PT_UNIMPLEMENTED_SO_FAR]	= X86_FEATURE_INTEL_PT,
+	[XFEATURE_PKRU]				= X86_FEATURE_PKU,
+	[XFEATURE_PASID]			= X86_FEATURE_ENQCMD,
 };
 
 /*
@@ -909,7 +908,8 @@ void __init fpu__init_system_xstate(void)
 	 * Clear XSAVE features that are disabled in the normal CPUID.
 	 */
 	for (i = 0; i < ARRAY_SIZE(xsave_cpuid_features); i++) {
-		if (!boot_cpu_has(xsave_cpuid_features[i]))
+		if (((i == 0) || xsave_cpuid_features[i]) &&
+		    !boot_cpu_has(xsave_cpuid_features[i]))
 			xfeatures_mask_all &= ~BIT_ULL(i);
 	}
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 18/29] x86/fpu/xstate: Disable XSTATE support if an inconsistent state is detected
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (16 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 17/29] x86/fpu/xstate: Adjust the XSAVE feature table to address gaps in state component numbers Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 19/29] x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature bits Chang S. Bae
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

The kernel has a sanity check between two methods to calculate XSTATE size.
In the unlikely event that they disagree, disable the use of XSTATE.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v4:
* Added as a new patch. (Thomas Gleixner)
---
 arch/x86/kernel/fpu/xstate.c | 40 ++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index b6bf32cb650d..e5a734d88660 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -608,11 +608,11 @@ static void __xstate_dump_leaves(void)
 } while (0)
 
 #define XCHECK_SZ(sz, nr, nr_macro, __struct) do {			\
-	if ((nr == nr_macro) &&						\
-	    WARN_ONCE(sz != sizeof(__struct),				\
-		"%s: struct is %zu bytes, cpu state %d bytes\n",	\
-		__stringify(nr_macro), sizeof(__struct), sz)) {		\
+	if ((nr == nr_macro) &&	(sz != sizeof(__struct))) {		\
+		pr_err("%s: struct is %zu bytes, cpu state %d bytes\n",	\
+		       __stringify(nr_macro), sizeof(__struct), sz);	\
 		__xstate_dump_leaves();					\
+		return -EINVAL;						\
 	}								\
 } while (0)
 
@@ -621,7 +621,7 @@ static void __xstate_dump_leaves(void)
  * that our software representation matches what the CPU
  * tells us about the state's size.
  */
-static void check_xstate_against_struct(int nr)
+static int check_xstate_against_struct(int nr)
 {
 	/*
 	 * Ask the CPU for the size of the state.
@@ -649,9 +649,12 @@ static void check_xstate_against_struct(int nr)
 	    (nr >= XFEATURE_MAX) ||
 	    (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) ||
 	    ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_LBR))) {
-		WARN_ONCE(1, "no structure for xstate: %d\n", nr);
+		pr_err("no structure for xstate: %d\n", nr);
 		XSTATE_WARN_ON(1);
+		return -EINVAL;
 	}
+
+	return 0;
 }
 
 /**
@@ -664,13 +667,14 @@ static void check_xstate_against_struct(int nr)
  * excluded. Only the size of the buffer for task->fpu is checked here.
  *
  * @include_dynamic_states:	A knob to include dynamic states or not.
+ * @size:			A pointer to record the size.
  *
- * Return:			The calculated xstate size.
+ * Return:			0 if successful; otherwise, error code.
  */
-static unsigned int calculate_xstate_size(bool include_dynamic_states)
+static int calculate_xstate_size(bool include_dynamic_states, unsigned int *size)
 {
 	unsigned int xstate_size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
-	int i;
+	int i, err;
 
 	for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
 		if (!xfeature_enabled(i))
@@ -679,7 +683,10 @@ static unsigned int calculate_xstate_size(bool include_dynamic_states)
 		if ((xfeatures_mask_user_dynamic & BIT_ULL(i)) && !include_dynamic_states)
 			continue;
 
-		check_xstate_against_struct(i);
+		err = check_xstate_against_struct(i);
+		if (err)
+			return err;
+
 		/*
 		 * Supervisor state components can be managed only by
 		 * XSAVES.
@@ -705,7 +712,9 @@ static unsigned int calculate_xstate_size(bool include_dynamic_states)
 		xstate_size += xfeature_size(i);
 	}
 
-	return xstate_size;
+	if (size)
+		*size = xstate_size;
+	return 0;
 }
 
 /*
@@ -791,6 +800,7 @@ static int __init init_xstate_size(void)
 	/* Recompute the context size for enabled features: */
 	unsigned int possible_xstate_size, xstate_size;
 	unsigned int xsave_size;
+	int err;
 
 	xsave_size = get_xsave_size();
 
@@ -803,14 +813,18 @@ static int __init init_xstate_size(void)
 	 * Calculate the maximum xstate size, including the dynamic states.
 	 */
 	fpu_buf_cfg.max_size = possible_xstate_size;
-	xstate_size = calculate_xstate_size(true);
+	err = calculate_xstate_size(true, &xstate_size);
+	if (err)
+		return err;
 	XSTATE_WARN_ON(possible_xstate_size != xstate_size);
 
 	/*
 	 * Calculate the minimum xstate size, i.e., excluding the dynamic
 	 * xstates.
 	 */
-	xstate_size = calculate_xstate_size(false);
+	err = calculate_xstate_size(false, &xstate_size);
+	if (err)
+		return err;
 	if (!is_supported_xstate_size(xstate_size))
 		return -EINVAL;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 19/29] x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature bits
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (17 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 18/29] x86/fpu/xstate: Disable XSTATE support if an inconsistent state is detected Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 20/29] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae
                   ` (12 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

Intel's Advanced Matrix Extension (AMX) is a new 64-bit extended feature
consisting of two-dimensional registers and an accelerator unit. The first
implementation of the latter is the tile matrix multiply unit (TMUL). TMUL
performs SIMD dot-products on four bytes (INT8) or two bfloat16
floating-point (BF16) elements.

Here enumerate this hardware capability to be shown as 'amx_tile',
'amx_bf16', and 'amx_int8' in /proc/cpuinfo.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v4:
* Massaged the changelog a bit.
---
 arch/x86/include/asm/cpufeatures.h | 3 +++
 arch/x86/kernel/cpu/cpuid-deps.c   | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index ab7b3a2de85d..dc0fb04cce69 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -386,7 +386,10 @@
 #define X86_FEATURE_TSXLDTRK		(18*32+16) /* TSX Suspend Load Address Tracking */
 #define X86_FEATURE_PCONFIG		(18*32+18) /* Intel PCONFIG */
 #define X86_FEATURE_ARCH_LBR		(18*32+19) /* Intel ARCH LBR */
+#define X86_FEATURE_AMX_BF16		(18*32+22) /* AMX BF16 Support */
 #define X86_FEATURE_AVX512_FP16		(18*32+23) /* AVX512 FP16 */
+#define X86_FEATURE_AMX_TILE		(18*32+24) /* AMX tile Support */
+#define X86_FEATURE_AMX_INT8		(18*32+25) /* AMX INT8 Support */
 #define X86_FEATURE_SPEC_CTRL		(18*32+26) /* "" Speculation Control (IBRS + IBPB) */
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
 #define X86_FEATURE_FLUSH_L1D		(18*32+28) /* Flush L1D cache */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 7f891d2eb52e..9a520ab259ac 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -76,6 +76,9 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_SGX1,			X86_FEATURE_SGX       },
 	{ X86_FEATURE_SGX2,			X86_FEATURE_SGX1      },
 	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVE     },
+	{ X86_FEATURE_AMX_TILE,			X86_FEATURE_XSAVE     },
+	{ X86_FEATURE_AMX_INT8,			X86_FEATURE_AMX_TILE  },
+	{ X86_FEATURE_AMX_BF16,			X86_FEATURE_AMX_TILE  },
 	{}
 };
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 20/29] x86/fpu/amx: Define AMX state components and have it used for boot-time checks
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (18 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 19/29] x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature bits Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 21/29] x86/fpu/amx: Initialize child's AMX state Chang S. Bae
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

Linux uses check_xstate_against_struct() to sanity check the size of
XSTATE-enabled features. AMX is the XSAVE-enabled feature, and its size is
not hard-coded but discoverable at run-time via CPUID.

The AMX state is composed of state components 17 and 18, which are all user
state components. The first component is the XTILECFG state of a 64-byte
tile-related control register. The state component 18, called XTILEDATA,
contains the actual tile data, and the state size varies on
implementations. The architectural maximum, as defined in the CPUID(0x1d,
1): EAX[15:0], is a byte less than 64KB. The first implementation supports
8KB.

Check the XTILEDATA state size dynamically. The feature introduces the new
tile register, TMM. Define one register struct only and read the number of
registers from CPUID. Cross-check the overall size with CPUID again.

Define that the permission-required states are XTILECFG and above.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v10:
* Expand permission-required states to include XTILECFG and later, rather
  than just XFD-protected states.

Changes from v8:
* bugfix: Fix off-by-one-error in check_xstate_against_struct() feature
  number argument.

Changes from v4:
* Changed to return an error when tile data size mismatches. (Thomas Gleixner)
* Updated the function description and code comments.

Changes from v2:
* Updated the code comments.

Changes from v1:
* Rebased on the upstream kernel (5.10)
---
 arch/x86/include/asm/fpu/types.h  | 27 +++++++++++
 arch/x86/include/asm/fpu/xstate.h |  6 ++-
 arch/x86/kernel/fpu/xstate.c      | 80 ++++++++++++++++++++++++++++++-
 3 files changed, 110 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 617184b0afec..1e0a6f73d8a9 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -120,6 +120,9 @@ enum xfeature {
 	XFEATURE_RSRVD_COMP_13,
 	XFEATURE_RSRVD_COMP_14,
 	XFEATURE_LBR,
+	XFEATURE_RSRVD_COMP_16,
+	XFEATURE_XTILE_CFG,
+	XFEATURE_XTILE_DATA,
 
 	XFEATURE_MAX,
 };
@@ -136,11 +139,15 @@ enum xfeature {
 #define XFEATURE_MASK_PKRU		(1 << XFEATURE_PKRU)
 #define XFEATURE_MASK_PASID		(1 << XFEATURE_PASID)
 #define XFEATURE_MASK_LBR		(1 << XFEATURE_LBR)
+#define XFEATURE_MASK_XTILE_CFG	(1 << XFEATURE_XTILE_CFG)
+#define XFEATURE_MASK_XTILE_DATA	(1 << XFEATURE_XTILE_DATA)
 
 #define XFEATURE_MASK_FPSSE		(XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
 #define XFEATURE_MASK_AVX512		(XFEATURE_MASK_OPMASK \
 					 | XFEATURE_MASK_ZMM_Hi256 \
 					 | XFEATURE_MASK_Hi16_ZMM)
+#define XFEATURE_MASK_XTILE		(XFEATURE_MASK_XTILE_DATA \
+					 | XFEATURE_MASK_XTILE_CFG)
 
 #define FIRST_EXTENDED_XFEATURE	XFEATURE_YMM
 
@@ -153,6 +160,9 @@ struct reg_256_bit {
 struct reg_512_bit {
 	u8	regbytes[512/8];
 };
+struct reg_1024_byte {
+	u8	regbytes[1024];
+};
 
 /*
  * State component 2:
@@ -255,6 +265,23 @@ struct arch_lbr_state {
 	u64 ler_to;
 	u64 ler_info;
 	struct lbr_entry		entries[];
+};
+
+/*
+ * State component 17: 64-byte tile configuration register.
+ */
+struct xtile_cfg {
+	u64				tcfg[8];
+} __packed;
+
+/*
+ * State component 18: 1KB tile data register.
+ * Each register represents 16 64-byte rows of the matrix
+ * data. But the number of registers depends on the actual
+ * implementation.
+ */
+struct xtile_data {
+	struct reg_1024_byte		tmm;
 } __packed;
 
 /*
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index e0437dfd897b..e3d431e04bbf 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -16,6 +16,8 @@
 
 #define XSTATE_CPUID		0x0000000d
 
+#define TILE_CPUID		0x0000001d
+
 #define FXSAVE_SIZE	512
 
 #define XSAVE_HDR_SIZE	    64
@@ -37,8 +39,8 @@
 				      XFEATURE_MASK_BNDREGS | \
 				      XFEATURE_MASK_BNDCSR)
 
-/* Require ARCH_SET_STATE_ENABLE for future features  */
-#define XFEATURE_MASK_PERMISSION_REQUIRED GENMASK_ULL(63, XFEATURE_MAX)
+/* Require ARCH_SET_STATE_ENABLE from XTILE_CFG and later states */
+#define XFEATURE_MASK_PERMISSION_REQUIRED GENMASK_ULL(63, XFEATURE_XTILE_CFG)
 
 /*
  * Features which are restored when returning to user space.
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index e5a734d88660..2ebc98c4b496 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -41,6 +41,14 @@ static const char *xfeature_names[] =
 	"Protection Keys User registers",
 	"PASID state",
 	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"unknown xstate feature"	,
+	"AMX Tile config"		,
+	"AMX Tile data"			,
+	"unknown xstate feature"	,
 };
 
 static unsigned short xsave_cpuid_features[] __initdata = {
@@ -54,6 +62,8 @@ static unsigned short xsave_cpuid_features[] __initdata = {
 	[XFEATURE_PT_UNIMPLEMENTED_SO_FAR]	= X86_FEATURE_INTEL_PT,
 	[XFEATURE_PKRU]				= X86_FEATURE_PKU,
 	[XFEATURE_PASID]			= X86_FEATURE_ENQCMD,
+	[XFEATURE_XTILE_CFG]			= X86_FEATURE_AMX_TILE,
+	[XFEATURE_XTILE_DATA]			= X86_FEATURE_AMX_TILE,
 };
 
 /*
@@ -343,6 +353,8 @@ static void __init print_xstate_features(void)
 	print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
 	print_xstate_feature(XFEATURE_MASK_PKRU);
 	print_xstate_feature(XFEATURE_MASK_PASID);
+	print_xstate_feature(XFEATURE_MASK_XTILE_CFG);
+	print_xstate_feature(XFEATURE_MASK_XTILE_DATA);
 }
 
 /*
@@ -616,6 +628,67 @@ static void __xstate_dump_leaves(void)
 	}								\
 } while (0)
 
+/**
+ * check_xtile_data_against_struct - Check tile data state size.
+ *
+ * Calculate the state size by multiplying the single tile size which is
+ * recorded in a C struct, and the number of tiles that the CPU informs.
+ * Compare the provided size with the calculation.
+ *
+ * @size:	The tile data state size
+ *
+ * Returns:	0 on success, -EINVAL on mismatch.
+ */
+static int check_xtile_data_against_struct(int size)
+{
+	u32 max_palid, palid, state_size;
+	u32 eax, ebx, ecx, edx;
+	u16 max_tile;
+
+	/*
+	 * Check the maximum palette id:
+	 *   eax: the highest numbered palette subleaf.
+	 */
+	cpuid_count(TILE_CPUID, 0, &max_palid, &ebx, &ecx, &edx);
+
+	/*
+	 * Cross-check each tile size and find the maximum number of
+	 * supported tiles.
+	 */
+	for (palid = 1, max_tile = 0; palid <= max_palid; palid++) {
+		u16 tile_size, max;
+
+		/*
+		 * Check the tile size info:
+		 *   eax[31:16]:  bytes per title
+		 *   ebx[31:16]:  the max names (or max number of tiles)
+		 */
+		cpuid_count(TILE_CPUID, palid, &eax, &ebx, &edx, &edx);
+		tile_size = eax >> 16;
+		max = ebx >> 16;
+
+		if (tile_size != sizeof(struct xtile_data)) {
+			pr_err("%s: struct is %zu bytes, cpu xtile %d bytes\n",
+			       __stringify(XFEATURE_XTILE_DATA),
+			       sizeof(struct xtile_data), tile_size);
+			__xstate_dump_leaves();
+			return -EINVAL;
+		}
+
+		if (max > max_tile)
+			max_tile = max;
+	}
+
+	state_size = sizeof(struct xtile_data) * max_tile;
+	if (size != state_size) {
+		pr_err("%s: calculated size is %u bytes, cpu state %d bytes\n",
+		       __stringify(XFEATURE_XTILE_DATA), state_size, size);
+		__xstate_dump_leaves();
+		return -EINVAL;
+	}
+	return 0;
+}
+
 /*
  * We have a C struct for each 'xstate'.  We need to ensure
  * that our software representation matches what the CPU
@@ -639,6 +712,11 @@ static int check_xstate_against_struct(int nr)
 	XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM,  struct avx_512_hi16_state);
 	XCHECK_SZ(sz, nr, XFEATURE_PKRU,      struct pkru_state);
 	XCHECK_SZ(sz, nr, XFEATURE_PASID,     struct ia32_pasid_state);
+	XCHECK_SZ(sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg);
+
+	/* The tile data size varies between implementations. */
+	if (nr == XFEATURE_XTILE_DATA)
+		check_xtile_data_against_struct(sz);
 
 	/*
 	 * Make *SURE* to add any feature numbers in below if
@@ -648,7 +726,7 @@ static int check_xstate_against_struct(int nr)
 	if ((nr < XFEATURE_YMM) ||
 	    (nr >= XFEATURE_MAX) ||
 	    (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) ||
-	    ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_LBR))) {
+	    ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_RSRVD_COMP_16))) {
 		pr_err("no structure for xstate: %d\n", nr);
 		XSTATE_WARN_ON(1);
 		return -EINVAL;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 21/29] x86/fpu/amx: Initialize child's AMX state
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (19 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 20/29] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 22/29] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae
                   ` (10 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

Assure that a forked child starts AMX registers in the INIT-state.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v5:
* Introduced a new define. (Andy Lutomirski)

Changes from v4:
* Added as a new patch. This was missing on previous versions.
---
 arch/x86/include/asm/fpu/xstate.h | 3 +++
 arch/x86/kernel/fpu/core.c        | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index e3d431e04bbf..0e355dfe711b 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -85,6 +85,9 @@
 				      XFEATURE_MASK_INDEPENDENT | \
 				      XFEATURE_MASK_SUPERVISOR_UNSUPPORTED)
 
+/* Volatile states that a child does not inherit. */
+#define XFEATURE_MASK_CLEARED_ON_CLONE	XFEATURE_MASK_XTILE
+
 #ifdef CONFIG_X86_64
 #define REX_PREFIX	"0x48, "
 #else
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index b722360d2a85..fdac0f430af3 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -300,6 +300,9 @@ int fpu_clone(struct task_struct *dst)
 		save_fpregs_to_fpstate(dst_fpu);
 	fpregs_unlock();
 
+	if (xfeatures_mask_all & XFEATURE_MASK_CLEARED_ON_CLONE)
+		dst_fpu->state->xsave.header.xfeatures &= ~XFEATURE_MASK_CLEARED_ON_CLONE;
+
 	set_tsk_thread_flag(dst, TIF_NEED_FPU_LOAD);
 
 	trace_x86_fpu_copy_src(src_fpu);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 22/29] x86/fpu/amx: Enable the AMX feature in 64-bit mode
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (20 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 21/29] x86/fpu/amx: Initialize child's AMX state Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 23/29] x86/fpu/xstate: Skip writing zeros to signal frame for dynamic user states if in INIT-state Chang S. Bae
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

In 64-bit mode, include the AMX state components in
XFEATURE_MASK_USER_SUPPORTED.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v5:
* Adjusted macro changes and moved the disabling code for non-64-bit mode
  for the new base changes.

Changes from v4:
* Removed the irrelevant line from the changelog. (Thomas Gleixner)
---
 arch/x86/include/asm/fpu/xstate.h | 3 ++-
 arch/x86/kernel/fpu/xstate.c      | 6 +++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 0e355dfe711b..0b337e913423 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -37,7 +37,8 @@
 				      XFEATURE_MASK_Hi16_ZMM	 | \
 				      XFEATURE_MASK_PKRU | \
 				      XFEATURE_MASK_BNDREGS | \
-				      XFEATURE_MASK_BNDCSR)
+				      XFEATURE_MASK_BNDCSR | \
+				      XFEATURE_MASK_XTILE)
 
 /* Require ARCH_SET_STATE_ENABLE from XTILE_CFG and later states */
 #define XFEATURE_MASK_PERMISSION_REQUIRED GENMASK_ULL(63, XFEATURE_XTILE_CFG)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 2ebc98c4b496..43539893dd82 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -492,7 +492,8 @@ static void __init print_xstate_offset_size(void)
 	 XFEATURE_MASK_PKRU |			\
 	 XFEATURE_MASK_BNDREGS |		\
 	 XFEATURE_MASK_BNDCSR |			\
-	 XFEATURE_MASK_PASID)
+	 XFEATURE_MASK_PASID |			\
+	 XFEATURE_MASK_XTILE)
 
 /*
  * setup the xstate image representing the init state
@@ -1008,6 +1009,9 @@ void __init fpu__init_system_xstate(void)
 	xfeatures_mask_all &= XFEATURE_MASK_USER_SUPPORTED |
 			      XFEATURE_MASK_SUPERVISOR_SUPPORTED;
 
+	if (!IS_ENABLED(CONFIG_X86_64))
+		xfeatures_mask_all &= ~XFEATURE_MASK_XTILE;
+
 	/* Store it for paranoia check at the end */
 	xfeatures = xfeatures_mask_all;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 23/29] x86/fpu/xstate: Skip writing zeros to signal frame for dynamic user states if in INIT-state
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (21 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 22/29] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 24/29] selftest/x86/amx: Test cases for the AMX state management Chang S. Bae
                   ` (8 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

By default, for XSTATE features in the INIT-state, XSAVE writes zeros to
the uncompressed destination buffer.

E.g., if you are not using AVX-512, you will still get a bunch of zeros on
the signal stack where live AVX-512 data would go.

For permssion-required states (currently AMX state), explicitly skip this
data transfer. The result is that the user buffer for the AMX region will
not be touched by XSAVE.

[ Reading XINUSE takes about 20-30 cycles, but writing zeros consumes about
  5-times or more, e.g., for XTILEDATA. ]

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v10:
* Simplify the sigframe XSAVE code: replace check for XFD STATE with
  XTILECFG and later STATE.

Changes from v9:
* Use cpu_feature_enabled() instead of boot_cpu_has(). (Borislav Petkov)

Changes from v5:
* Mentioned the optimization trade-offs in the changelog. (Dave Hansen)
* Added code comment.

Changes from v4:
* Added as a new patch.
---
 arch/x86/include/asm/fpu/internal.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 06be4c247c97..5f013fa0b205 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -355,8 +355,12 @@ static inline int xsave_to_user_sigframe(struct xregs_state __user *buf)
 		mask = uabi_mask & ~xfeatures_mask_user_perm();
 
 		if (sig_xstate_expanded(current)) {
-			u64 cur_uabi_mask = uabi_mask & current->thread.fpu.state_mask;
+			u64 cur_uabi_mask;
 
+			if (cpu_feature_enabled(X86_FEATURE_XGETBV1))
+				cur_uabi_mask = uabi_mask & xgetbv(1);
+			else
+				cur_uabi_mask = uabi_mask & current->thread.fpu.state_mask;
 			mask |= cur_uabi_mask & xfeatures_mask_user_perm();
 		}
 	}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 24/29] selftest/x86/amx: Test cases for the AMX state management
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (22 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 23/29] x86/fpu/xstate: Skip writing zeros to signal frame for dynamic user states if in INIT-state Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 25/29] x86/insn/amx: Add TILERELEASE instruction to the opcode map Chang S. Bae
                   ` (7 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae, linux-kselftest

This selftest verifies that the XSTATE arch_prctl works for AMX state and
that a forked task has the AMX state in the INIT-state.

In addition, this test verifies that the kernel correctly context switches
unique AMX data, when multiple threads are using AMX. The test also
verifies that ptrace() can insert data into existing threads.

Finally, add a test case to verify that unused states are excluded, by
leaving a known pattern on the signal stack and verifying that it is still
intact after taking a subsequent signal.

These test cases do not depend on AMX compiler support, as they employ
userspace-XSAVE directly to access AMX state.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
---
Changes from v10:
* Update the selftest for v11 (e.g. signal delivery changes for XTILECFG
  state).

Changes from v9:
* Minor cleanup on the ptrace test code.

Changes from v8:
* Adjust for the arch_prctl change.
* Assure XTILECFG is recovered upon sigreturn.

Changes from v7:
* Adjust for SIGILL.
* Test XTILECFG for legacy signal delivery.

Changes from v6:
* Adjust for the syscall and ptrace path changes.

Changes from v5:
* Adjusted arch_prctl for the updated ABI.
* Added test for the dynamic signal xstate buffer.
* Fixed XSAVE buffer's header data.

Changes from v4:
* Added test for arch_prctl.
* Excluded tile config details to focus on testing the kernel's ability to
  manage dynamic user state.
* Removed tile instructions.
* Simplified the fork() and ptrace() test routine.
* Massaged the changelog.

Changes from v2:
* Updated the test messages and the changelog as tile data is not inherited
  to a child anymore.
* Removed bytecode for the instructions already supported by binutils.
* Changed to check the XSAVE availability in a reliable way.

Changes from v1:
* Removed signal testing code
---
 tools/testing/selftests/x86/Makefile |    2 +-
 tools/testing/selftests/x86/amx.c    | 1048 ++++++++++++++++++++++++++
 2 files changed, 1049 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/x86/amx.c

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index b4142cd1c5c2..8a1f62ab3c8e 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -18,7 +18,7 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
 			test_FCMOV test_FCOMI test_FISTTP \
 			vdso_restorer
 TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering \
-			corrupt_xstate_header
+			corrupt_xstate_header amx
 # Some selftests require 32bit support enabled also on 64bit systems
 TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall
 
diff --git a/tools/testing/selftests/x86/amx.c b/tools/testing/selftests/x86/amx.c
new file mode 100644
index 000000000000..ee18697b08ee
--- /dev/null
+++ b/tools/testing/selftests/x86/amx.c
@@ -0,0 +1,1048 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+#include <err.h>
+#include <errno.h>
+#include <elf.h>
+#include <pthread.h>
+#include <setjmp.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdbool.h>
+#include <unistd.h>
+#include <x86intrin.h>
+
+#include <linux/futex.h>
+
+#include <sys/auxv.h>
+#include <sys/mman.h>
+#include <sys/ptrace.h>
+#include <sys/shm.h>
+#include <sys/syscall.h>
+#include <sys/wait.h>
+#include <sys/uio.h>
+
+#ifndef __x86_64__
+# error This test is 64-bit only
+#endif
+
+static inline uint64_t xgetbv(uint32_t index)
+{
+	uint32_t eax, edx;
+
+	asm volatile("xgetbv;"
+		     : "=a" (eax), "=d" (edx)
+		     : "c" (index));
+	return eax + ((uint64_t)edx << 32);
+}
+
+static inline void cpuid(uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx)
+{
+	asm volatile("cpuid;"
+		     : "=a" (*eax), "=b" (*ebx), "=c" (*ecx), "=d" (*edx)
+		     : "0" (*eax), "2" (*ecx));
+}
+
+static inline void xsave(void *xbuf, uint32_t lo, uint32_t hi)
+{
+	asm volatile("xsave (%%rdi)"
+		     : : "D" (xbuf), "a" (lo), "d" (hi)
+		     : "memory");
+}
+
+static inline void xrstor(void *xbuf, uint32_t lo, uint32_t hi)
+{
+	asm volatile("xrstor (%%rdi)"
+		     : : "D" (xbuf), "a" (lo), "d" (hi));
+}
+
+static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
+		       int flags)
+{
+	struct sigaction sa;
+
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_sigaction = handler;
+	sa.sa_flags = SA_SIGINFO | flags;
+	sigemptyset(&sa.sa_mask);
+	if (sigaction(sig, &sa, 0))
+		err(1, "sigaction");
+}
+
+static void clearhandler(int sig)
+{
+	struct sigaction sa;
+
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_handler = SIG_DFL;
+	sigemptyset(&sa.sa_mask);
+	if (sigaction(sig, &sa, 0))
+		err(1, "sigaction");
+}
+
+static jmp_buf jmpbuf;
+
+/* Hardware info check: */
+
+static bool noxsave;
+
+static void handle_noxsave(int sig, siginfo_t *si, void *ctx_void)
+{
+	noxsave = true;
+	siglongjmp(jmpbuf, 1);
+}
+
+#define XFEATURE_XTILECFG	17
+#define XFEATURE_XTILEDATA	18
+#define XFEATURE_MASK_XTILECFG	(1 << XFEATURE_XTILECFG)
+#define XFEATURE_MASK_XTILEDATA	(1 << XFEATURE_XTILEDATA)
+#define XFEATURE_MASK_XTILE	(XFEATURE_MASK_XTILECFG | XFEATURE_MASK_XTILEDATA)
+
+static inline bool check_xtile(void)
+{
+	bool xtile_enable;
+
+	sethandler(SIGILL, handle_noxsave, 0);
+
+	if ((!sigsetjmp(jmpbuf, 1)) && (xgetbv(0) & XFEATURE_MASK_XTILE)) {
+		xtile_enable = true;
+		goto out;
+	}
+	xtile_enable = false;
+out:
+	clearhandler(SIGILL);
+	return xtile_enable;
+}
+
+static uint32_t xsave_size;
+static uint32_t xsave_xtiledata_offset, xsave_xtilecfg_offset;
+static uint32_t xtiledata_size, xtilecfg_size;
+
+static struct _tile_spec {
+	uint16_t bytes_per_row;
+	uint16_t max_names;
+	uint16_t max_rows;
+} tile_spec;
+
+#define XSTATE_CPUID			0xd
+#define XSTATE_USER_STATE_SUBLEAVE	0x0
+#define TILE_CPUID			0x1d
+#define TILE_PALETTE_ID			0x1
+
+static void check_cpuid(void)
+{
+	uint32_t eax, ebx, ecx, edx;
+
+	eax = XSTATE_CPUID;
+	ecx = XSTATE_USER_STATE_SUBLEAVE;
+
+	cpuid(&eax, &ebx, &ecx, &edx);
+	if (!ebx)
+		err(1, "xstate cpuid: xsave size");
+
+	xsave_size = ebx;
+
+	eax = XSTATE_CPUID;
+	ecx = XFEATURE_XTILECFG;
+
+	cpuid(&eax, &ebx, &ecx, &edx);
+	if (!eax || !ebx)
+		err(1, "xstate cpuid: tile config state");
+
+	xtilecfg_size = eax;
+	xsave_xtilecfg_offset = ebx;
+
+	eax = XSTATE_CPUID;
+	ecx = XFEATURE_XTILEDATA;
+
+	cpuid(&eax, &ebx, &ecx, &edx);
+	if (!eax || !ebx)
+		err(1, "xstate cpuid: tile data state");
+
+	xtiledata_size = eax;
+	xsave_xtiledata_offset = ebx;
+
+	eax = TILE_CPUID;
+	ecx = TILE_PALETTE_ID;
+
+	cpuid(&eax, &ebx, &ecx, &edx);
+	if (!eax || !ebx || !ecx)
+		err(1, "tile cpuid: palette 1");
+
+	tile_spec.max_names = ebx >> 16;
+	tile_spec.bytes_per_row = ebx;
+	tile_spec.max_rows = ecx;
+}
+
+/* The helpers for managing XSAVE buffer and tile states: */
+
+void *alloc_xsave_buffer(void)
+{
+	void *xbuf;
+
+	/* XSAVE buffer should be 64B-aligned. */
+	xbuf = aligned_alloc(64, xsave_size);
+	if (!xbuf)
+		err(1, "aligned_alloc()");
+	return xbuf;
+}
+
+#define XSAVE_HDR_OFFSET	512
+#define XSAVE_HDR_SIZE		64
+
+static inline void clear_xstate_header(void *buffer)
+{
+	memset(buffer + XSAVE_HDR_OFFSET, 0, XSAVE_HDR_SIZE);
+}
+
+static inline uint64_t get_xstatebv(void *buffer)
+{
+	return *(uint64_t *)(buffer + XSAVE_HDR_OFFSET);
+}
+
+static inline void set_xstatebv(void *buffer, uint64_t bv)
+{
+	*(uint64_t *)(buffer + XSAVE_HDR_OFFSET) = bv;
+}
+
+static void set_rand_tiledata(void *tiledata)
+{
+	int *ptr = tiledata;
+	int data = rand();
+	int i;
+
+	for (i = 0; i < xtiledata_size / sizeof(int); i++, ptr++)
+		*ptr = data;
+}
+
+#define	MAX_TILES		16
+#define RESERVED_BYTES		14
+
+struct tile_config {
+	uint8_t  palette_id;
+	uint8_t  start_row;
+	uint8_t  reserved[RESERVED_BYTES];
+	uint16_t colsb[MAX_TILES];
+	uint8_t  rows[MAX_TILES];
+};
+
+static void set_tilecfg(void *tilecfg)
+{
+	struct tile_config *cfg = tilecfg;
+	int i;
+
+	memset(cfg, 0, sizeof(*cfg));
+	cfg->palette_id = TILE_PALETTE_ID;
+	for (i = 0; i < tile_spec.max_names; i++) {
+		cfg->colsb[i] = tile_spec.bytes_per_row;
+		cfg->rows[i] = tile_spec.max_rows;
+	}
+}
+
+static void *xsave_buffer, *tiledata, *tilecfg;
+static int nerrs, errs;
+
+/* See 'struct _fpx_sw_bytes' at sigcontext.h */
+#define SW_BYTES_OFFSET		464
+/* N.B. The struct's field name varies so read from the offset. */
+#define SW_BYTES_BV_OFFSET	(SW_BYTES_OFFSET + 8)
+
+static inline struct _fpx_sw_bytes *get_fpx_sw_bytes(void *buffer)
+{
+	return (struct _fpx_sw_bytes *)(buffer + SW_BYTES_OFFSET);
+}
+
+static inline uint64_t get_fpx_sw_bytes_xstatebv(void *buffer)
+{
+	return *(uint64_t *)(buffer + SW_BYTES_BV_OFFSET);
+}
+
+static volatile bool noperm;
+static bool check_tilecfg_sigframe;
+
+static void handle_noperm(int sig, siginfo_t *si, void *ctx_void)
+{
+	ucontext_t *ctx = (ucontext_t *)ctx_void;
+	void *xbuf = ctx->uc_mcontext.fpregs;
+	struct _fpx_sw_bytes *sw_bytes;
+
+	printf("\tAt SIGILL handler,\n");
+
+	if (si->si_code != ILL_ILLOPC) {
+		errs++;
+		printf("[FAIL]\tInvalid signal code (%x).\n", si->si_code);
+	} else {
+		printf("[OK]\tValid signal code (ILL_ILLOPC).\n");
+	}
+
+	sw_bytes = get_fpx_sw_bytes(xbuf);
+	if ((sw_bytes->xstate_size < xsave_xtiledata_offset) &&
+	    !(get_fpx_sw_bytes_xstatebv(xbuf) & XFEATURE_MASK_XTILEDATA)) {
+		printf("[OK]\tValid xstate size and mask in the SW data of xstate buffer.\n");
+	} else {
+		errs++;
+		printf("[FAIL]\tInvalid xstate size and/or mask in the SW data of xstate buf.\n");
+	}
+
+	if (check_tilecfg_sigframe) {
+		if (memcmp(tilecfg, xbuf + xsave_xtilecfg_offset, xtilecfg_size)) {
+			printf("[OK]\tTILECFG state is not copied.\n");
+		} else {
+			errs++;
+			printf("[FAIL]\tTILECFG state is copied.\n");
+		}
+	}
+
+	noperm = true;
+	ctx->uc_mcontext.gregs[REG_RIP] += 3; /* Skip the faulting XRSTOR */
+}
+
+/* Return true if XRSTOR is successful; otherwise, false.  */
+static inline bool xrstor_safe(void *buffer, uint32_t lo, uint32_t hi)
+{
+	noperm = false;
+	xrstor(buffer, lo, hi);
+	return !noperm;
+}
+
+/* arch_prctl test */
+
+#define ARCH_SET_STATE_ENABLE	0x1021
+#define ARCH_GET_STATE_ENABLE	0x1022
+
+static bool perm_fail_expected;
+
+static void enable_tiledata(void)
+{
+	unsigned long bitmask;
+	long rc;
+
+	rc = syscall(SYS_arch_prctl, ARCH_SET_STATE_ENABLE, XFEATURE_MASK_XTILEDATA);
+
+	if (perm_fail_expected) {
+		if (rc) {
+			printf("[OK]\tARCH_SET_STATE_ENABLE failed.\n");
+			return;
+		}
+
+		nerrs++;
+		printf("[FAIL]\tARCH_SET_STATE_ENABLE succeeded.\n");
+		}
+	} else if (rc) {
+		goto fail;
+	}
+
+	rc = syscall(SYS_arch_prctl, ARCH_GET_STATE_ENABLE, &bitmask);
+	if (rc) {
+		err(1, "ARCH_GET_STATE_ENABLE");
+	} else if (bitmask & XFEATURE_MASK_XTILEDATA) {
+		printf("\t\tARCH_SET_STATE_ENABLE succeeded.\n");
+		return;
+	}
+
+fail:
+	err(1, "ARCH_SET_STATE_ENABLE");
+}
+
+#define TEST_EXECV_ARG		"nested"
+
+#ifndef AT_MINSIGSTKSZ
+#  define AT_MINSIGSTKSZ	51
+#endif
+
+static bool sigaltstack_fail_expected;
+
+static void setup_altstack(void *addr, unsigned long size)
+{
+	stack_t ss;
+	int rc;
+
+	memset(&ss, 0, sizeof(ss));
+	ss.ss_size = size;
+	ss.ss_sp = addr;
+
+	rc = sigaltstack(&ss, NULL);
+
+	if (sigaltstack_fail_expected) {
+		if (rc) {
+			printf("[OK]\tsigaltstack() failed.\n");
+		} else {
+			nerrs++;
+			printf("[FAIL]\tsigaltstack() succeeded.\n");
+		}
+	} else if (rc) {
+		err(1, "sigaltstack()");
+	}
+}
+
+static void test_arch_prctl(int argc, char **argv)
+{
+	pid_t parent, child, grandchild;
+	void *altstack;
+
+	parent = fork();
+	if (parent < 0) {
+		err(1, "fork");
+	} else if (parent > 0) {
+		int status;
+
+		wait(&status);
+		if (!WIFEXITED(status) || WEXITSTATUS(status))
+			err(1, "arch_prctl test parent exit");
+		return;
+	}
+
+	printf("[RUN]\tCheck ARCH_SET_STATE_ENABLE around process fork() and sigaltack() test.\n");
+
+	printf("\tFork a child.\n");
+	child = fork();
+	if (child < 0) {
+		err(1, "fork");
+	} else if (child > 0) {
+		int status;
+
+		perm_fail_expected = false;
+		printf("\tDo ARCH_SET_STATE_ENABLE at parent:\n");
+		enable_tiledata();
+
+		wait(&status);
+		if (!WIFEXITED(status) || WEXITSTATUS(status))
+			err(1, "arch_prctl test child exit");
+		_exit(0);
+	}
+
+	clear_xstate_header(xsave_buffer);
+
+	/* Scribble XTILECFG */
+	set_xstatebv(xsave_buffer, XFEATURE_MASK_XTILECFG);
+	set_tilecfg(xsave_buffer + xsave_xtilecfg_offset);
+	xrstor(xsave_buffer, -1, -1);
+	memcpy(tilecfg, xsave_buffer + xsave_xtilecfg_offset, xtilecfg_size);
+
+	set_xstatebv(xsave_buffer, XFEATURE_MASK_XTILEDATA);
+	set_rand_tiledata(xsave_buffer + xsave_xtiledata_offset);
+
+	printf("\tLoad tile data without ARCH_SET_STATE_ENABLE at child.\n");
+	/*
+	 * Test XTILECFG state delivery via signal, when XTILEDATA is not
+	 * permitted. It should be also prevented.
+	 */
+	check_tilecfg_sigframe = true;
+	if (xrstor_safe(xsave_buffer, -1, -1)) {
+		nerrs++;
+		printf("[FAIL]\tSucceeded at child.\n");
+	} else {
+		printf("[OK]\tBlocked at child.\n");
+
+		/* Assure XTILECFG state recovery at sigreturn. */
+		printf("\tReturn from signal handler,\n");
+		xsave(xsave_buffer, XFEATURE_MASK_XTILECFG, 0);
+		if (memcmp(tilecfg, xsave_buffer + xsave_xtilecfg_offset, xtilecfg_size)) {
+			printf("[OK]\tTilecfg is not restored.\n");
+		} else {
+			nerrs++;
+			printf("[FAIL]\tTilecfg is restored.\n");
+		}
+	}
+
+	printf("\tARCH_SET_STATE_ENABLE failure test at child:\n");
+	printf("\t(after sigaltstack() with small size)\n");
+
+	altstack = mmap(NULL, getauxval(AT_MINSIGSTKSZ) * 2,
+			PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
+	if (altstack == MAP_FAILED)
+		err(1, "mmap()");
+
+	sigaltstack_fail_expected = false;
+	setup_altstack(altstack, getauxval(AT_MINSIGSTKSZ) - 1);
+	perm_fail_expected = true;
+	enable_tiledata();
+
+	printf("\tARCH_SET_STATE_ENABLE again with enough altstack at child:\n");
+	sigaltstack_fail_expected = false;
+	setup_altstack(&altstack, getauxval(AT_MINSIGSTKSZ) + 1);
+	perm_fail_expected = false;
+	enable_tiledata();
+
+	printf("\tsigaltstack() failure test at child:\n");
+	printf("\t(with small altstack after ARCH_SET_STATE_ENABLE)\n");
+	sigaltstack_fail_expected = true;
+	setup_altstack(altstack, getauxval(AT_MINSIGSTKSZ) - 1);
+
+	printf("\tLoad tile data with ARCH_SET_STATE_ENABLE at child:\n");
+	check_tilecfg_sigframe = false;
+	if (xrstor_safe(xsave_buffer, -1, -1)) {
+		printf("[OK]\tSucceeded at child.\n");
+	} else {
+		nerrs++;
+		printf("[FAIL]\tBlocked at child.\n");
+	}
+
+	printf("\tFork a grandchild.\n");
+	grandchild = fork();
+	if (grandchild < 0) {
+		err(1, "fork");
+	} else if (!grandchild) {
+		char *args[] = {argv[0], TEST_EXECV_ARG, NULL};
+
+		printf("\tLoad tile data at grandchild before execv():\n");
+		if (xrstor_safe(xsave_buffer, -1, -1)) {
+			printf("[OK]\tSucceeded at grandchild.\n");
+		} else {
+			nerrs++;
+			printf("[FAIL]\tBlocked at grandchild.\n");
+		}
+		nerrs += execv(args[0], args);
+	} else {
+		int status;
+
+		wait(&status);
+		if (!WIFEXITED(status) || WEXITSTATUS(status))
+			err(1, "fork test grandchild");
+	}
+	_exit(0);
+}
+
+/* Testing tile data inheritance */
+
+static void test_fork(void)
+{
+	pid_t child, grandchild;
+
+	child = fork();
+	if (child < 0) {
+		err(1, "fork");
+	} else if (child > 0) {
+		int status;
+
+		wait(&status);
+		if (!WIFEXITED(status) || WEXITSTATUS(status))
+			err(1, "fork test child");
+		return;
+	}
+
+	printf("[RUN]\tCheck tile data inheritance.\n\tBefore fork(), load tile data -- yes:\n");
+
+	clear_xstate_header(xsave_buffer);
+	set_xstatebv(xsave_buffer, XFEATURE_MASK_XTILE);
+	set_tilecfg(xsave_buffer + xsave_xtilecfg_offset);
+	set_rand_tiledata(xsave_buffer + xsave_xtiledata_offset);
+	xrstor_safe(xsave_buffer, -1, -1);
+
+	grandchild = fork();
+	if (grandchild < 0) {
+		err(1, "fork");
+	} else if (grandchild > 0) {
+		int status;
+
+		wait(&status);
+		if (!WIFEXITED(status) || WEXITSTATUS(status))
+			err(1, "fork test grand child");
+		_exit(0);
+	}
+
+	if (xgetbv(1) & XFEATURE_MASK_XTILE) {
+		nerrs++;
+		printf("[FAIL]\tIn a child, AMX state is not initialized.\n");
+	} else {
+		printf("[OK]\tIn a child, AMX state is initialized.\n");
+	}
+	_exit(0);
+}
+
+/* Context switching test */
+
+#define ITERATIONS	10
+#define NUM_THREADS	5
+
+struct futex_info {
+	int current;
+	int *futex;
+	int next;
+};
+
+static inline void command_wait(struct futex_info *info, int value)
+{
+	do {
+		sched_yield();
+	} while (syscall(SYS_futex, info->futex, FUTEX_WAIT, value, 0, 0, 0));
+}
+
+static inline void command_wake(struct futex_info *info, int value)
+{
+	do {
+		*info->futex = value;
+		while (!syscall(SYS_futex, info->futex, FUTEX_WAKE, 1, 0, 0, 0))
+			sched_yield();
+	} while (0);
+}
+
+static inline int get_iterative_value(int id)
+{
+	return ((id << 1) & ~0x1);
+}
+
+static inline int get_endpoint_value(int id)
+{
+	return ((id << 1) | 0x1);
+}
+
+static void *check_tiledata(void *info)
+{
+	struct futex_info *finfo = (struct futex_info *)info;
+	void *xbuf, *tdata;
+	int i;
+
+	xbuf = alloc_xsave_buffer();
+	tdata = malloc(xtiledata_size);
+	if (!tdata)
+		err(1, "malloc()");
+
+	set_xstatebv(xbuf, XFEATURE_MASK_XTILEDATA);
+	set_rand_tiledata(xbuf + xsave_xtiledata_offset);
+	xrstor_safe(xbuf, -1, -1);
+	memcpy(tdata, xbuf + xsave_xtiledata_offset, xtiledata_size);
+
+	for (i = 0; i < ITERATIONS; i++) {
+		command_wait(finfo, get_iterative_value(finfo->current));
+
+		xsave(xbuf, XFEATURE_MASK_XTILEDATA, 0);
+		if (memcmp(tdata, xbuf + xsave_xtiledata_offset, xtiledata_size))
+			errs++;
+
+		set_rand_tiledata(xbuf + xsave_xtiledata_offset);
+		xrstor_safe(xbuf, -1, -1);
+		memcpy(tdata, xbuf + xsave_xtiledata_offset, xtiledata_size);
+
+		command_wake(finfo, get_iterative_value(finfo->next));
+	}
+
+	command_wait(finfo, get_endpoint_value(finfo->current));
+
+	free(xbuf);
+	free(tdata);
+	return NULL;
+}
+
+static int create_threads(int num, struct futex_info *finfo)
+{
+	const int shm_id = shmget(IPC_PRIVATE, sizeof(int), IPC_CREAT | 0666);
+	int *futex = shmat(shm_id, NULL, 0);
+	pthread_t thread;
+	int i;
+
+	for (i = 0; i < num; i++) {
+		finfo[i].futex = futex;
+		finfo[i].current = i + 1;
+		finfo[i].next = (i + 2) % (num + 1);
+
+		if (pthread_create(&thread, NULL, check_tiledata, &finfo[i]))
+			err(1, "pthread_create()");
+	}
+	return 0;
+}
+
+static void test_context_switch(void)
+{
+	struct futex_info *finfo;
+	int i;
+
+	printf("[RUN]\tCheck tile data context switches.\n");
+	printf("\t# of context switches -- %u, # of threads -- %d:\n",
+	       ITERATIONS * NUM_THREADS, NUM_THREADS);
+
+	errs = 0;
+
+	finfo = malloc(sizeof(*finfo) * NUM_THREADS);
+	if (!finfo)
+		err(1, "malloc()");
+
+	create_threads(NUM_THREADS, finfo);
+
+	for (i = 0; i < ITERATIONS; i++) {
+		command_wake(finfo, get_iterative_value(1));
+		command_wait(finfo, get_iterative_value(0));
+	}
+
+	for (i = 1; i <= NUM_THREADS; i++)
+		command_wake(finfo, get_endpoint_value(i));
+
+	if (errs) {
+		nerrs += errs;
+		printf("[FAIL]\tIncorrect cases were found -- (%d / %u).\n",
+		       errs, ITERATIONS * NUM_THREADS);
+	} else {
+		printf("[OK]\tNo incorrect case was found.\n");
+	}
+
+	free(finfo);
+}
+
+/* Ptrace test */
+
+static bool ptracee_state_perm;
+
+static int inject_tiledata(pid_t target)
+{
+	struct iovec iov;
+
+	iov.iov_base = xsave_buffer;
+	iov.iov_len = xsave_size;
+
+	clear_xstate_header(xsave_buffer);
+	set_xstatebv(xsave_buffer, XFEATURE_MASK_XTILEDATA);
+	set_rand_tiledata(xsave_buffer + xsave_xtiledata_offset);
+	memcpy(tiledata, xsave_buffer + xsave_xtiledata_offset, xtiledata_size);
+
+	if (ptrace(PTRACE_SETREGSET, target, (uint32_t)NT_X86_XSTATE, &iov)) {
+		if (errno != EFAULT)
+			err(1, "PTRACE_SETREGSET");
+		else
+			return errno;
+	}
+
+	if (ptrace(PTRACE_GETREGSET, target, (uint32_t)NT_X86_XSTATE, &iov))
+		err(1, "PTRACE_GETREGSET");
+
+	if (!memcmp(tiledata, xsave_buffer + xsave_xtiledata_offset, xtiledata_size))
+		return 0;
+	else
+		return -1;
+}
+
+static void test_tile_write(void)
+{
+	int status, rc;
+	pid_t child;
+
+	child = fork();
+	if (child < 0) {
+		err(1, "fork");
+	} else if (!child) {
+		if (ptracee_state_perm) {
+			printf("\tARCH_SET_STATE_ENABLE at ptracee:\n");
+			perm_fail_expected = false;
+			enable_tiledata();
+		}
+
+		if (ptrace(PTRACE_TRACEME, 0, NULL, NULL))
+			err(1, "PTRACE_TRACEME");
+
+		raise(SIGTRAP);
+		_exit(0);
+	}
+
+	do {
+		wait(&status);
+	} while (WSTOPSIG(status) != SIGTRAP);
+
+	printf("\tInject tile data with%s ARCH_SET_STATE_ENABLE\n",
+	       ptracee_state_perm ? "" : "out");
+
+	rc = inject_tiledata(child);
+	if (rc == EFAULT && !ptracee_state_perm) {
+		printf("[OK]\tTile data was not written on ptracee.\n");
+	} else if (!rc && ptracee_state_perm) {
+		printf("[OK]\tTile data was written on ptracee.\n");
+	} else {
+		nerrs++;
+		printf("[FAIL]\tTile data was %swritten on ptracee.\n",
+		       rc ? "not " : "");
+	}
+
+	ptrace(PTRACE_DETACH, child, NULL, NULL);
+	wait(&status);
+	if (!WIFEXITED(status) || WEXITSTATUS(status))
+		err(1, "ptrace test");
+}
+
+static void test_ptrace(void)
+{
+	printf("[RUN]\tCheck ptrace() to inject tile data.\n");
+
+	ptracee_state_perm = false;
+	test_tile_write();
+
+	ptracee_state_perm = true;
+	test_tile_write();
+}
+
+/* Signal handling test */
+
+static bool init_tiledata, load_tiledata;
+static volatile bool signaled, sigstk_prefill;
+
+#define SIGFRAME_TILEDATA_SIGNATURE	0xEE
+
+static void handle_sigstk_prefill(int sig, siginfo_t *info, void *ctx_void)
+{
+	void *xbuf = ((ucontext_t *)ctx_void)->uc_mcontext.fpregs;
+	struct _fpx_sw_bytes *sw_bytes = get_fpx_sw_bytes(xsave);
+
+	if (sw_bytes->xstate_size >= (xsave_xtiledata_offset + xtiledata_size)) {
+		memset(xbuf + xsave_xtiledata_offset, SIGFRAME_TILEDATA_SIGNATURE,
+		       xtiledata_size);
+	}
+
+	sigstk_prefill = true;
+}
+
+static void handle_signal(int sig, siginfo_t *info, void *ctx_void)
+{
+	bool tiledata_area, tiledata_bit, tiledata_inuse;
+	void *xbuf = ((ucontext_t *)ctx_void)->uc_mcontext.fpregs;
+	struct _fpx_sw_bytes *sw_bytes = get_fpx_sw_bytes(xbuf);
+	char d = SIGFRAME_TILEDATA_SIGNATURE;
+	int i;
+
+	printf("\tAt signal delivery,\n");
+
+	/* Check SW reserved data in the buffer: */
+	if ((sw_bytes->xstate_size >= (xsave_xtiledata_offset + xtiledata_size)) &&
+	    (get_fpx_sw_bytes_xstatebv(xbuf) & XFEATURE_MASK_XTILEDATA)) {
+		printf("[OK]\tValid xstate size and mask in the SW data of xstate buffer\n");
+	} else {
+		errs++;
+		printf("[FAIL]\tInvalid xstate size and/or mask in the SW data of xstate buffer\n");
+	}
+
+	/* Check XSAVE buffer header: */
+	tiledata_inuse = (load_tiledata && !init_tiledata);
+	tiledata_bit = get_xstatebv(xbuf) & XFEATURE_MASK_XTILEDATA;
+
+	if (tiledata_bit == tiledata_inuse) {
+		printf("[OK]\tTiledata bit is %sset in XSTATE_BV of xstate buffer.\n",
+		       tiledata_bit ? "" : "not ");
+	} else {
+		errs++;
+		printf("[FAIL]\tTiledata bit is %sset in XSTATE_BV of xstate buffer.\n",
+		       tiledata_bit ? "" : "not ");
+	}
+
+	/*
+	 * Check the sigframe data:
+	 */
+
+	tiledata_inuse = (load_tiledata && !init_tiledata);
+	tiledata_area = false;
+	if (sw_bytes->xstate_size >= (xsave_xtiledata_offset + xtiledata_size)) {
+		for (i = 0; i < xtiledata_size; i++) {
+			if (memcmp(xbuf + xsave_xtiledata_offset + i, &d, 1)) {
+				tiledata_area = true;
+				break;
+			}
+		}
+	}
+
+	if (tiledata_area == tiledata_inuse) {
+		printf("[OK]\tTiledata is %ssaved in signal buffer.\n",
+		       tiledata_area ? "" : "not ");
+	} else {
+		errs++;
+		printf("[FAIL]\tTiledata is %ssaved in signal buffer.\n",
+		       tiledata_area ? "" : "not ");
+	}
+
+	/* Load random tiledata to test sigreturn: */
+	clear_xstate_header(xsave_buffer);
+	set_xstatebv(xsave_buffer, XFEATURE_MASK_XTILEDATA);
+	set_rand_tiledata(xsave_buffer + xsave_xtiledata_offset);
+	xrstor_safe(xsave_buffer, -1, -1);
+	signaled = true;
+}
+
+static void test_signal_handling(void)
+{
+	pid_t child;
+
+	signaled = false;
+	sigstk_prefill = false;
+
+	child = fork();
+	if (child < 0) {
+		err(1, "fork");
+	} else if (child > 0) {
+		do {
+			int status;
+
+			wait(&status);
+			if (WIFSTOPPED(status))
+				kill(child, SIGCONT);
+			else if (!WIFEXITED(status) || WEXITSTATUS(status))
+				err(1, "signal test child");
+			else
+				break;
+		} while (1);
+		return;
+	}
+
+	printf("\tBefore signal, load tile data -- %s", load_tiledata ? "yes, " : "no:\n");
+	if (load_tiledata)
+		printf("re-initialized -- %s:\n", init_tiledata ? "yes" : "no");
+
+	/*
+	 * Raise SIGUSR1 to pre-fill sig stack. Also, load tiledata to size the pre-fill.
+	 */
+
+	if (load_tiledata) {
+		clear_xstate_header(xsave_buffer);
+		set_xstatebv(xsave_buffer, XFEATURE_MASK_XTILEDATA);
+		xrstor_safe(xsave_buffer, -1, -1);
+	}
+
+	raise(SIGUSR1);
+	if (!sigstk_prefill)
+		err(1, "SIGUSR1");
+
+	/*
+	 * Raise SIGALRM to test AMX state handling in signal delivery. Set up the state and
+	 * data before the test.
+	 */
+
+	if (load_tiledata) {
+		clear_xstate_header(xsave_buffer);
+		set_xstatebv(xsave_buffer, XFEATURE_MASK_XTILEDATA);
+		set_rand_tiledata(xsave_buffer + xsave_xtiledata_offset);
+		xrstor_safe(xsave_buffer, -1, -1);
+
+		if (init_tiledata) {
+			clear_xstate_header(xsave_buffer);
+			set_xstatebv(xsave_buffer, 0);
+			xrstor_safe(xsave_buffer, -1, -1);
+			memset(tiledata, 0, xtiledata_size);
+		} else {
+			memcpy(tiledata, xsave_buffer + xsave_xtiledata_offset,
+			       xtiledata_size);
+		}
+	} else {
+		memset(tiledata, 0, xtiledata_size);
+	}
+
+	raise(SIGALRM);
+	if (!signaled)
+		err(1, "SIGALRM");
+
+	printf("\tReturn from signal handler,\n");
+	xsave(xsave_buffer, XFEATURE_MASK_XTILEDATA, 0);
+	if (memcmp(tiledata, xsave_buffer + xsave_xtiledata_offset, xtiledata_size)) {
+		errs++;
+		printf("[FAIL]\tTiledata is not restored.\n");
+	} else {
+		printf("[OK]\tTiledata is restored.\n");
+	}
+
+	if (errs)
+		nerrs++;
+	_exit(0);
+}
+
+static void test_signal(void)
+{
+	printf("[RUN]\tCheck tile data state in signal path:\n");
+
+	sethandler(SIGALRM, handle_signal, 0);
+	sethandler(SIGUSR1, handle_sigstk_prefill, 0);
+
+	load_tiledata = false;
+	init_tiledata = false;
+	errs = 0;
+	test_signal_handling();
+
+	load_tiledata = true;
+	init_tiledata = false;
+	errs = 0;
+	test_signal_handling();
+
+	load_tiledata = true;
+	init_tiledata = true;
+	errs = 0;
+	test_signal_handling();
+
+	clearhandler(SIGALRM);
+	clearhandler(SIGUSR1);
+}
+
+int main(int argc, char **argv)
+{
+	cpu_set_t cpuset;
+
+	if (argc == 2) {
+		int ret;
+
+		if (strcmp(argv[1], TEST_EXECV_ARG))
+			return 0;
+
+		check_cpuid();
+
+		printf("\tLoad tile data after execv().\n");
+
+		xsave_buffer = alloc_xsave_buffer();
+		clear_xstate_header(xsave_buffer);
+
+		set_xstatebv(xsave_buffer, XFEATURE_MASK_XTILE);
+		set_rand_tiledata(xsave_buffer + xsave_xtiledata_offset);
+
+		sethandler(SIGILL, handle_noperm, 0);
+
+		if (xrstor_safe(xsave_buffer, -1, -1)) {
+			printf("[FAIL]\tSucceeded after execv().\n");
+			ret = 1;
+		} else {
+			printf("[OK]\tBlocked after execv().\n");
+			ret = 0;
+		}
+
+		clearhandler(SIGILL);
+		free(xsave_buffer);
+		_exit(ret);
+	}
+
+	/* Check hardware availability at first */
+
+	if (!check_xtile()) {
+		printf("%s is disabled.\n", noxsave ? "XSAVE" : "AMX");
+		return 0;
+	}
+
+	check_cpuid();
+
+	xsave_buffer = alloc_xsave_buffer();
+	clear_xstate_header(xsave_buffer);
+
+	tiledata = malloc(xtiledata_size);
+	if (!tiledata)
+		err(1, "malloc()");
+
+	tilecfg = malloc(xtilecfg_size);
+	if (!tilecfg)
+		err(1, "malloc()");
+	set_tilecfg(tilecfg);
+
+	nerrs = 0;
+
+	sethandler(SIGILL, handle_noperm, 0);
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(0, &cpuset);
+
+	if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0)
+		err(1, "sched_setaffinity to CPU 0");
+
+	test_arch_prctl(argc, argv);
+	test_ptrace();
+
+	perm_fail_expected = false;
+	printf("\tARCH_SET_STATE_ENABLE for the following tests:\n");
+	enable_tiledata();
+	test_context_switch();
+	test_fork();
+	test_signal();
+
+	clearhandler(SIGILL);
+
+	free(tilecfg);
+	free(tiledata);
+	free(xsave_buffer);
+	return nerrs ? 1 : 0;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 25/29] x86/insn/amx: Add TILERELEASE instruction to the opcode map
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (23 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 24/29] selftest/x86/amx: Test cases for the AMX state management Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 26/29] intel_idle/amx: Add SPR support with XTILEDATA capability Chang S. Bae
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

Include the opcode of TILERELEASE that returns all the AMX state to
INIT-state.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v4:
* Added as a new patch as preparatory to use the instruction in the kernel.
---
 arch/x86/lib/x86-opcode-map.txt       | 8 +++++++-
 tools/arch/x86/lib/x86-opcode-map.txt | 8 +++++++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..dbc5078ccafe 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -690,7 +690,9 @@ AVXcode: 2
 45: vpsrlvd/q Vx,Hx,Wx (66),(v)
 46: vpsravd Vx,Hx,Wx (66),(v) | vpsravd/q Vx,Hx,Wx (66),(evo)
 47: vpsllvd/q Vx,Hx,Wx (66),(v)
-# Skip 0x48-0x4b
+# Skip 0x48
+49: Grp22 (1A)
+# Skip 0x4a-0x4b
 4c: vrcp14ps/d Vpd,Wpd (66),(ev)
 4d: vrcp14ss/d Vsd,Hpd,Wsd (66),(ev)
 4e: vrsqrt14ps/d Vpd,Wpd (66),(ev)
@@ -1082,6 +1084,10 @@ GrpTable: Grp21
 7: ENDBR64 (F3),(010),(11B) | ENDBR32 (F3),(011),(11B)
 EndTable
 
+GrpTable: Grp22
+0: TILERELEASE (!F3),(v1),(11B)
+EndTable
+
 # AMD's Prefetch Group
 GrpTable: GrpP
 0: PREFETCH
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..dbc5078ccafe 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -690,7 +690,9 @@ AVXcode: 2
 45: vpsrlvd/q Vx,Hx,Wx (66),(v)
 46: vpsravd Vx,Hx,Wx (66),(v) | vpsravd/q Vx,Hx,Wx (66),(evo)
 47: vpsllvd/q Vx,Hx,Wx (66),(v)
-# Skip 0x48-0x4b
+# Skip 0x48
+49: Grp22 (1A)
+# Skip 0x4a-0x4b
 4c: vrcp14ps/d Vpd,Wpd (66),(ev)
 4d: vrcp14ss/d Vsd,Hpd,Wsd (66),(ev)
 4e: vrsqrt14ps/d Vpd,Wpd (66),(ev)
@@ -1082,6 +1084,10 @@ GrpTable: Grp21
 7: ENDBR64 (F3),(010),(11B) | ENDBR32 (F3),(011),(11B)
 EndTable
 
+GrpTable: Grp22
+0: TILERELEASE (!F3),(v1),(11B)
+EndTable
+
 # AMD's Prefetch Group
 GrpTable: GrpP
 0: PREFETCH
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 26/29] intel_idle/amx: Add SPR support with XTILEDATA capability
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (24 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 25/29] x86/insn/amx: Add TILERELEASE instruction to the opcode map Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 27/29] x86/fpu/xstate: Add a sanity check for XFD state when saving XSTATE Chang S. Bae
                   ` (5 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae, linux-pm

Add a custom Sapphire Rapids (SPR) C-state table to intel_idle driver. The
parameters in this table are preferred over those supplied by ACPI.

SPR supports AMX, and so this custom table uses idle entry points that know
how to initialize AMX TMM state, if necessary.

This guarantees that AMX TMM state will never be the cause of hardware
C-state demotion from C6 to C1E. Under some conditions this may result in
improved power savings, and thus higher available turbo frequency budget.

[ Based on patch by Artem Bityutskiy <artem.bityutskiy@linux.intel.com>. ]

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
---
Changes from v9:
* Add a comment to use tile_release() after preempt_disable(). (Dave
  Hansen)
* Use cpu_feature_enabled() instead of boot_cpu_has(). (Borislav Petkov)
* Add a Suggested-by tag.

Changes from v6:
* Update the changelog and function description. (Rafael J. Wysocki)

Changes from v5:
* Moved the code to intel_idle. (Peter Zijlstra)
* Fixed to deactivate fpregs. (Andy Lutomirski and Dave Hansen)
* Updated the code comment. (Dave Hansen)

Changes from v4:
* Added as a new patch. (Thomas Gleixner)
---
 arch/x86/include/asm/special_insns.h |  6 ++
 drivers/idle/intel_idle.c            | 82 ++++++++++++++++++++++++++++
 2 files changed, 88 insertions(+)

diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 68c257a3de0d..e105c27c2951 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -294,6 +294,12 @@ static inline int enqcmds(void __iomem *dst, const void *src)
 	return 0;
 }
 
+static inline void tile_release(void)
+{
+	/* Instruction opcode for TILERELEASE; supported in binutils >= 2.36. */
+	asm volatile(".byte 0xc4, 0xe2, 0x78, 0x49, 0xc0");
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_SPECIAL_INSNS_H */
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index e6c543b5ee1d..72b72fa0e072 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -54,6 +54,8 @@
 #include <asm/intel-family.h>
 #include <asm/mwait.h>
 #include <asm/msr.h>
+#include <asm/fpu/internal.h>
+#include <asm/special_insns.h>
 
 #define INTEL_IDLE_VERSION "0.5.1"
 
@@ -155,6 +157,58 @@ static __cpuidle int intel_idle_s2idle(struct cpuidle_device *dev,
 	return 0;
 }
 
+/**
+ * idle_tile - Initialize TILE registers in INIT-state
+ *
+ * Leaving state in the dirty TILE registers may prevent the processor from
+ * entering lower-power idle states. Use TILERELEASE to initialize the
+ * state. Destroying fpregs state is safe after the fpstate update.
+ *
+ * WARNING: It should be called after preemption is disabled; otherwise,
+ * reschedule is possible with the destroyed state.
+ */
+static inline void idle_tile(void)
+{
+	if (cpu_feature_enabled(X86_FEATURE_XGETBV1) && (xgetbv(1) & XFEATURE_MASK_XTILE)) {
+		tile_release();
+		fpregs_deactivate(&current->thread.fpu);
+	}
+}
+
+/**
+ * intel_idle_tile - Ask the processor to enter the given idle state.
+ * @dev: cpuidle device of the target CPU.
+ * @drv: cpuidle driver (assumed to point to intel_idle_driver).
+ * @index: Target idle state index.
+ *
+ * Ensure TILE registers in INIT-state before using intel_idle() to
+ * enter the idle state.
+ */
+static __cpuidle int intel_idle_tile(struct cpuidle_device *dev,
+				     struct cpuidle_driver *drv, int index)
+{
+	idle_tile();
+
+	return intel_idle(dev, drv, index);
+}
+
+/**
+ * intel_idle_s2idle_tile - Ask the processor to enter the given idle state.
+ * @dev: cpuidle device of the target CPU.
+ * @drv: cpuidle driver (assumed to point to intel_idle_driver).
+ * @index: Target idle state index.
+ *
+ * Ensure TILE registers in INIT-state before using intel_idle_s2idle() to
+ * enter the idle state.
+ */
+static __cpuidle int intel_idle_s2idle_tile(struct cpuidle_device *dev,
+					    struct cpuidle_driver *drv, int index)
+{
+	idle_tile();
+
+	return intel_idle_s2idle(dev, drv, index);
+}
+
 /*
  * States are indexed by the cstate number,
  * which is also the index into the MWAIT hint array.
@@ -752,6 +806,27 @@ static struct cpuidle_state icx_cstates[] __initdata = {
 		.enter = NULL }
 };
 
+static struct cpuidle_state spr_cstates[] __initdata = {
+	{
+		.name = "C1",
+		.desc = "MWAIT 0x00",
+		.flags = MWAIT2flg(0x00),
+		.exit_latency = 1,
+		.target_residency = 1,
+		.enter = &intel_idle,
+		.enter_s2idle = intel_idle_s2idle, },
+	{
+		.name = "C6",
+		.desc = "MWAIT 0x20",
+		.flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED,
+		.exit_latency = 128,
+		.target_residency = 384,
+		.enter = &intel_idle_tile,
+		.enter_s2idle = intel_idle_s2idle_tile, },
+	{
+		.enter = NULL }
+};
+
 static struct cpuidle_state atom_cstates[] __initdata = {
 	{
 		.name = "C1E",
@@ -1095,6 +1170,12 @@ static const struct idle_cpu idle_cpu_icx __initconst = {
 	.use_acpi = true,
 };
 
+static const struct idle_cpu idle_cpu_spr __initconst = {
+	.state_table = spr_cstates,
+	.disable_promotion_to_c1e = true,
+	.use_acpi = true,
+};
+
 static const struct idle_cpu idle_cpu_avn __initconst = {
 	.state_table = avn_cstates,
 	.disable_promotion_to_c1e = true,
@@ -1157,6 +1238,7 @@ static const struct x86_cpu_id intel_idle_ids[] __initconst = {
 	X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X,		&idle_cpu_skx),
 	X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X,		&idle_cpu_icx),
 	X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_D,		&idle_cpu_icx),
+	X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X,	&idle_cpu_spr),
 	X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL,	&idle_cpu_knl),
 	X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM,	&idle_cpu_knl),
 	X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT,	&idle_cpu_bxt),
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 27/29] x86/fpu/xstate: Add a sanity check for XFD state when saving XSTATE
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (25 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 26/29] intel_idle/amx: Add SPR support with XTILEDATA capability Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 28/29] x86/arch_prctl: ARCH_GET_FEATURES_WITH_KERNEL_ASSISTANCE Chang S. Bae
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

Add a DEBUG sanity check that XFD state matches with XINUSE state.

Instead of reading MSR IA32_XFD directly, read a per-cpu value that is
recorded at every MSR write.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v9:
* Re-introduce xfd_write() to record every XFD write.
* Use cpu_feature_enabled() instead of boot_cpu_has(). (Borislav Petkov)

Changes from v5:
* Added as a new patch. (Dave Hansen)
---
 arch/x86/include/asm/fpu/internal.h | 22 +++++++++++++++++++++-
 arch/x86/kernel/fpu/core.c          | 13 +++++++++++++
 arch/x86/kernel/fpu/xstate.c        | 12 +++++-------
 arch/x86/kernel/traps.c             |  9 ++++-----
 4 files changed, 43 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 5f013fa0b205..1129abc6ae06 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -556,6 +556,26 @@ static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
  * Misc helper functions:
  */
 
+#ifdef CONFIG_X86_DEBUG_FPU
+DECLARE_PER_CPU(u64, xfd_shadow);
+static inline u64 xfd_debug_shadow(void)
+{
+	return this_cpu_read(xfd_shadow);
+}
+
+static inline void xfd_write(u64 value)
+{
+	wrmsrl_safe(MSR_IA32_XFD, value);
+	this_cpu_write(xfd_shadow, value);
+}
+#else
+#define xfd_debug_shadow()	0
+static inline void xfd_write(u64 value)
+{
+	wrmsrl_safe(MSR_IA32_XFD, value);
+}
+#endif
+
 /**
  * xfd_switch - Switches the MSR IA32_XFD context if needed.
  * @prev:	The previous task's struct fpu pointer
@@ -572,7 +592,7 @@ static inline void xfd_switch(struct fpu *prev, struct fpu *next)
 	next_xfd_mask = next->state_mask & xfeatures_mask_user_dynamic;
 
 	if (unlikely(prev_xfd_mask != next_xfd_mask))
-		wrmsrl_safe(MSR_IA32_XFD, xfeatures_mask_user_dynamic ^ next_xfd_mask);
+		xfd_write(xfeatures_mask_user_dynamic ^ next_xfd_mask);
 }
 
 /*
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index fdac0f430af3..be6c210c00d4 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -82,6 +82,10 @@ bool irq_fpu_usable(void)
 }
 EXPORT_SYMBOL(irq_fpu_usable);
 
+#ifdef CONFIG_X86_DEBUG_FPU
+DEFINE_PER_CPU(u64, xfd_shadow);
+#endif
+
 /*
  * Save the FPU register state in fpu->state. The register state is
  * preserved.
@@ -99,6 +103,15 @@ EXPORT_SYMBOL(irq_fpu_usable);
 void save_fpregs_to_fpstate(struct fpu *fpu)
 {
 	if (likely(use_xsave())) {
+		/*
+		 * If XFD is armed for an xfeature, XSAVE* will not save
+		 * its state. Verify XFD is clear for all features that
+		 * are in use before XSAVE*.
+		 */
+		if (IS_ENABLED(CONFIG_X86_DEBUG_FPU) && cpu_feature_enabled(X86_FEATURE_XFD) &&
+		    cpu_feature_enabled(X86_FEATURE_XGETBV1))
+			WARN_ON_FPU(xgetbv(1) & xfd_debug_shadow());
+
 		os_xsave(&fpu->state->xsave, fpu->state_mask);
 
 		/*
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 43539893dd82..81566a18643b 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -269,7 +269,7 @@ void fpu__init_cpu_xstate(void)
 	}
 
 	if (boot_cpu_has(X86_FEATURE_XFD))
-		wrmsrl(MSR_IA32_XFD, xfeatures_mask_user_dynamic);
+		xfd_write(xfeatures_mask_user_dynamic);
 }
 
 static bool xfeature_enabled(enum xfeature xfeature)
@@ -1095,9 +1095,8 @@ void fpu__resume_cpu(void)
 	}
 
 	if (cpu_feature_enabled(X86_FEATURE_XFD))
-		wrmsrl_safe(MSR_IA32_XFD, (current->thread.fpu.state_mask &
-					   xfeatures_mask_user_dynamic) ^
-					  xfeatures_mask_user_dynamic);
+		xfd_write((current->thread.fpu.state_mask & xfeatures_mask_user_dynamic) ^
+			  xfeatures_mask_user_dynamic);
 }
 
 /**
@@ -1325,9 +1324,8 @@ void reset_state_perm(struct task_struct *tsk)
 	if (cpu_feature_enabled(X86_FEATURE_XSAVES))
 		fpstate_init_xstate(&fpu->state->xsave, fpu->state_mask);
 
-	wrmsrl_safe(MSR_IA32_XFD,
-		    (fpu->state_mask & xfeatures_mask_user_dynamic) ^
-		    xfeatures_mask_user_dynamic);
+	xfd_write((fpu->state_mask & xfeatures_mask_user_dynamic) ^
+		  xfeatures_mask_user_dynamic);
 }
 
 /**
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index bbf30e73d156..cc19b570b322 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1129,7 +1129,7 @@ static __always_inline bool handle_xfd_event(struct fpu *fpu, struct pt_regs *re
 			 * unblock the task.
 			 */
 			rdmsrl_safe(MSR_IA32_XFD, &value);
-			wrmsrl_safe(MSR_IA32_XFD, value & ~xfd_err);
+			xfd_write(value & ~xfd_err);
 		} else {
 			struct fpu *fpu = &current->thread.fpu;
 			int err = -1;
@@ -1148,10 +1148,9 @@ static __always_inline bool handle_xfd_event(struct fpu *fpu, struct pt_regs *re
 				if (!WARN_ON(in_interrupt())) {
 					err = realloc_xstate_buffer(fpu, xfd_event);
 					if (!err)
-						wrmsrl_safe(MSR_IA32_XFD,
-							    (fpu->state_mask &
-							     xfeatures_mask_user_dynamic) ^
-							    xfeatures_mask_user_dynamic);
+						xfd_write((fpu->state_mask &
+							  xfeatures_mask_user_dynamic) ^
+							  xfeatures_mask_user_dynamic);
 				}
 
 				if (err)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 28/29] x86/arch_prctl: ARCH_GET_FEATURES_WITH_KERNEL_ASSISTANCE
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (26 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 27/29] x86/fpu/xstate: Add a sanity check for XFD state when saving XSTATE Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:37 ` [PATCH v11 29/29] x86/arch_prctl: ARCH_SET_STATE_ENABLE_ALLOC Chang S. Bae
                   ` (3 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

arch_prctl(ARCH_GET_FEATURES_WITH_KERNEL_ASSISTANCE, u64 *bitmask)
    Return the bitmask of the kernel-supported features. If XSAVE is
    disabled, the bitmask indicates only legacy states.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v9:
* Add as a new patch. (Thiago Macieira and Borislav Petkov)
---
 arch/x86/include/uapi/asm/prctl.h | 23 +++++++++++----------
 arch/x86/kernel/fpu/xstate.c      | 34 ++++++++++++++++++++-----------
 arch/x86/kernel/process.c         |  1 +
 3 files changed, 35 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index c73e141ce90a..6912d5fe85f3 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -2,19 +2,20 @@
 #ifndef _ASM_X86_PRCTL_H
 #define _ASM_X86_PRCTL_H
 
-#define ARCH_SET_GS		0x1001
-#define ARCH_SET_FS		0x1002
-#define ARCH_GET_FS		0x1003
-#define ARCH_GET_GS		0x1004
+#define ARCH_SET_GS					0x1001
+#define ARCH_SET_FS					0x1002
+#define ARCH_GET_FS					0x1003
+#define ARCH_GET_GS					0x1004
 
-#define ARCH_GET_CPUID		0x1011
-#define ARCH_SET_CPUID		0x1012
+#define ARCH_GET_CPUID					0x1011
+#define ARCH_SET_CPUID					0x1012
 
-#define ARCH_SET_STATE_ENABLE	0x1021
-#define ARCH_GET_STATE_ENABLE	0x1022
+#define ARCH_GET_FEATURES_WITH_KERNEL_ASSISTANCE	0x1020
+#define ARCH_SET_STATE_ENABLE				0x1021
+#define ARCH_GET_STATE_ENABLE				0x1022
 
-#define ARCH_MAP_VDSO_X32	0x2001
-#define ARCH_MAP_VDSO_32	0x2002
-#define ARCH_MAP_VDSO_64	0x2003
+#define ARCH_MAP_VDSO_X32				0x2001
+#define ARCH_MAP_VDSO_32				0x2002
+#define ARCH_MAP_VDSO_64				0x2003
 
 #endif /* _ASM_X86_PRCTL_H */
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 81566a18643b..75db4def5ec5 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1299,6 +1299,24 @@ int realloc_xstate_buffer(struct fpu *fpu, u64 mask)
 	return 0;
 }
 
+/**
+ * get_features_mask_uabi - Get a feature list that the kernel supports
+ * Return:	A bitmap that indicates which state the kernel enabled.
+ */
+static u64 get_features_mask_uabi(void)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_FPU))
+		return 0;
+
+	if (use_xsave())
+		return xfeatures_mask_uabi();
+
+	if (use_fxsr())
+		return XFEATURE_MASK_FPSSE;
+
+	return XFEATURE_MASK_FP;
+}
+
 /**
  * reset_state_perm - Reset a task's permission for dynamic user state
  *
@@ -1329,7 +1347,7 @@ void reset_state_perm(struct task_struct *tsk)
 }
 
 /**
- * do_arch_prctl_state - Read or write the state permission.
+ * do_arch_prctl_state - Handle xstate-related buffer or usage control
  * @fpu:	A struct task_struct * pointer
  * @option:	A subfunction of arch_prctl()
  * @arg2:	Either a pointer to userspace memory or state-component
@@ -1338,22 +1356,14 @@ void reset_state_perm(struct task_struct *tsk)
  */
 long do_arch_prctl_state(struct task_struct *tsk, int option, unsigned long arg2)
 {
-	u64 features_mask;
-
-	if (!cpu_feature_enabled(X86_FEATURE_FPU))
-		features_mask = 0;
-	else if (use_fxsr())
-		features_mask = XFEATURE_MASK_FPSSE;
-	else
-		features_mask = XFEATURE_MASK_FP;
+	u64 features_mask = get_features_mask_uabi();
 
 	switch (option) {
+	case ARCH_GET_FEATURES_WITH_KERNEL_ASSISTANCE:
+		return put_user(features_mask, (unsigned long __user *)arg2);
 	case ARCH_SET_STATE_ENABLE: {
 		u64 state_perm = arg2;
 
-		if (use_xsave())
-			features_mask = xfeatures_mask_uabi();
-
 		if (state_perm & ~features_mask)
 			return -EINVAL;
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b43e2b0f52f2..ae53ffd76882 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -1023,6 +1023,7 @@ long do_arch_prctl_common(struct task_struct *task, int option,
 		return get_cpuid_mode();
 	case ARCH_SET_CPUID:
 		return set_cpuid_mode(task, arg2);
+	case ARCH_GET_FEATURES_WITH_KERNEL_ASSISTANCE:
 	case ARCH_SET_STATE_ENABLE:
 	case ARCH_GET_STATE_ENABLE:
 		return do_arch_prctl_state(task, option, arg2);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v11 29/29] x86/arch_prctl: ARCH_SET_STATE_ENABLE_ALLOC
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (27 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 28/29] x86/arch_prctl: ARCH_GET_FEATURES_WITH_KERNEL_ASSISTANCE Chang S. Bae
@ 2021-10-01 22:37 ` Chang S. Bae
  2021-10-01 22:47 ` [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Bae, Chang Seok
                   ` (2 subsequent siblings)
  31 siblings, 0 replies; 45+ messages in thread
From: Chang S. Bae @ 2021-10-01 22:37 UTC (permalink / raw)
  To: bp, luto, tglx, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

arch_prctl(ARCH_SET_STATE_ENABLE, u64 bitmask, ARCH_SET_STATE_ENABLE_ALLOC)
    ARCH_SET_STATE_ENABLE_ALLOC causes the kernel to synchronously allocate
    XSTATE for the current process for the state specified by the given
    bitmask. ARCH_SET_STATE_ENABLE users may optionally ask the kernel to
    allocate enough XSTATE buffer for the requested states by specifying
    ARCH_SET_STATE_ENABLE_ALLOC in the 3rd parameter. ARCH_SET_STATE_ENABLE
    must be successful to return success. If the kernel has an allocation
    failure in this call, no buffers are allocated, no permission is
    granted, and the system call returns an error.
    Return codes:
        0: success (including repeated calls)
        ENOMEM: memory allocation failure

The buffers are pre-allocated and saved in the state field of struct
fpu_prealloc to prevent a race with context switch use of the buffers.
Context switch (or XFD trap) check for pre-allocated buffers and actually
install the struct fpu_prealloc's state pointer.

This allocation attribute is represented in a new thread flag --
TIF_FPU_PREALLOC. The code always references the group leader's to limit
race conditions.

Expand the arch_prctl prototype to receive this option.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v10:
* New patch: add static allocation feature flag:
  ARCH_SET_STATE_ENABLE_ALLOC. (Thomas Gleixner)
---
 arch/x86/include/asm/fpu/internal.h |   2 +
 arch/x86/include/asm/fpu/types.h    |  23 +++++
 arch/x86/include/asm/fpu/xstate.h   |   5 +-
 arch/x86/include/asm/proto.h        |   2 +-
 arch/x86/include/asm/thread_info.h  |   3 +
 arch/x86/include/uapi/asm/prctl.h   |   1 +
 arch/x86/kernel/fpu/core.c          |  29 ++++--
 arch/x86/kernel/fpu/regset.c        |   3 +
 arch/x86/kernel/fpu/xstate.c        | 148 +++++++++++++++++++++++++---
 arch/x86/kernel/process.c           |   5 +-
 arch/x86/kernel/process_32.c        |   5 +-
 arch/x86/kernel/process_64.c        |  10 +-
 arch/x86/kernel/traps.c             |   3 +
 13 files changed, 207 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 1129abc6ae06..20ebbcfeb2a0 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -537,6 +537,8 @@ static inline void fpregs_restore_userregs(void)
 static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
 {
 	if (static_cpu_has(X86_FEATURE_FPU) && !(current->flags & PF_KTHREAD)) {
+		if (old_fpu->prealloc.state)
+			activate_xstate_prealloc_buffer(old_fpu);
 		save_fpregs_to_fpstate(old_fpu);
 		/*
 		 * The save operation preserved register state, so the
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 1e0a6f73d8a9..ebbb33f6500a 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -336,6 +336,21 @@ union fpregs_state {
 	u8 __padding[PAGE_SIZE];
 };
 
+/**
+ * struct fpu_prealloc - Records a new buffer, that is pre-allocated
+ *			 remotely, which the owner picks at the context
+ *			 switch or at the buffer expansion.
+ * @lock:		A spinlock to orderly update the preallocation
+ * @state:		A pointer to the new buffer.
+ * @mask:		A state mask to indicate which state component to
+ *			be saved in the new buffer.
+ */
+struct fpu_prealloc {
+	spinlock_t			lock;
+	union fpregs_state		*state;
+	u64				mask;
+};
+
 /*
  * Highest level per task FPU state data structure that
  * contains the FPU register state plus various FPU
@@ -399,6 +414,14 @@ struct fpu {
 	 */
 	union fpregs_state		*state;
 
+	/*
+	 * @prealloc:
+	 *
+	 * Pre-allocated memory by ARCH_SET_STATE_ENABLE_ALLOC. A task may
+	 * check this and use on next context switch.
+	 */
+	struct fpu_prealloc		prealloc;
+
 	/*
 	 * @__default_state:
 	 *
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 0b337e913423..152bfe4939cd 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -174,8 +174,10 @@ extern struct fpu_xstate_buffer_config fpu_buf_cfg;
 
 unsigned int calculate_xstate_buf_size_from_mask(u64 mask);
 void *get_xsave_addr(struct fpu *fpu, int xfeature_nr);
+union fpregs_state *alloc_xstate_buffer(unsigned int size);
 int realloc_xstate_buffer(struct fpu *fpu, u64 mask);
 void free_xstate_buffer(union fpregs_state *state);
+void activate_xstate_prealloc_buffer(struct fpu *fpu);
 
 /**
  * get_group_state_perm - Get a per-process state permission
@@ -217,7 +219,8 @@ static inline bool sig_xstate_expanded(struct task_struct *tsk)
 extern bool arch_enough_sigaltstack(struct task_struct *tsk, size_t ss_size);
 
 void reset_state_perm(struct task_struct *tsk);
-long do_arch_prctl_state(struct task_struct *tsk, int option, unsigned long arg2);
+long do_arch_prctl_state(struct task_struct *tsk, int option, unsigned long arg2,
+			 unsigned long arg3);
 
 int xfeature_size(int xfeature_nr);
 int copy_uabi_from_kernel_to_xstate(struct fpu *fpu, const void *kbuf);
diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index feed36d44d04..83a77fc85e44 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -40,6 +40,6 @@ void x86_report_nx(void);
 extern int reboot_force;
 
 long do_arch_prctl_common(struct task_struct *task, int option,
-			  unsigned long arg2);
+			  unsigned long arg2, unsigned long arg3);
 
 #endif /* _ASM_X86_PROTO_H */
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index cf132663c219..ba9e83d6da08 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -97,6 +97,8 @@ struct thread_info {
 #define TIF_FORCED_TF		24	/* true if TF in eflags artificially */
 #define TIF_BLOCKSTEP		25	/* set when we want DEBUGCTLMSR_BTF */
 #define TIF_LAZY_MMU_UPDATES	27	/* task is updating the mmu lazily */
+#define TIF_FPU_PREALLOC	28	/* FPU buffer can be preallocated. */
+					/* Always reference group_leader's value, not each task's */
 #define TIF_ADDR32		29	/* 32-bit address space on 64 bits */
 
 #define _TIF_NOTIFY_RESUME	(1 << TIF_NOTIFY_RESUME)
@@ -120,6 +122,7 @@ struct thread_info {
 #define _TIF_FORCED_TF		(1 << TIF_FORCED_TF)
 #define _TIF_BLOCKSTEP		(1 << TIF_BLOCKSTEP)
 #define _TIF_LAZY_MMU_UPDATES	(1 << TIF_LAZY_MMU_UPDATES)
+#define _TIF_FPU_PREALLOC	(1 << TIF_FPU_PREALLOC)
 #define _TIF_ADDR32		(1 << TIF_ADDR32)
 
 /* flags to check in __switch_to() */
diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index 6912d5fe85f3..7fc9f161b93d 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -12,6 +12,7 @@
 
 #define ARCH_GET_FEATURES_WITH_KERNEL_ASSISTANCE	0x1020
 #define ARCH_SET_STATE_ENABLE				0x1021
+# define ARCH_SET_STATE_ENABLE_ALLOC			1
 #define ARCH_GET_STATE_ENABLE				0x1022
 
 #define ARCH_MAP_VDSO_X32				0x2001
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index be6c210c00d4..d55d80c2e194 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -279,6 +279,7 @@ int fpu_clone(struct task_struct *dst)
 {
 	struct fpu *src_fpu = &current->thread.fpu;
 	struct fpu *dst_fpu = &dst->thread.fpu;
+	unsigned int size;
 
 	/* The new task's FPU state cannot be valid in the hardware. */
 	dst_fpu->last_cpu = -1;
@@ -288,17 +289,33 @@ int fpu_clone(struct task_struct *dst)
 
 	/*
 	 * The child does not inherit the dynamic states but permission.
-	 * Use the buffer embedded in struct task_struct, which has the
-	 * minimum size.
+	 * Expand the buffer enough for the permitted features if allowed.
+	 * Otherwise, use the buffer embedded in struct task_struct, which
+	 * has the minimum size.
 	 */
 	dst_fpu->state_perm = get_group_state_perm(current);
-	dst_fpu->state_mask = (xfeatures_mask_all & ~xfeatures_mask_user_dynamic);
-	dst_fpu->state = &dst_fpu->__default_state;
+	if (test_tsk_thread_flag(current->group_leader, TIF_FPU_PREALLOC)) {
+		size = calculate_xstate_buf_size_from_mask(dst_fpu->state_perm);
+		dst_fpu->state = alloc_xstate_buffer(size);
+		if (!dst_fpu->state)
+			return -ENOMEM;
+		dst_fpu->state_mask = dst_fpu->state_perm;
+		set_tsk_thread_flag(dst, TIF_FPU_PREALLOC);
+	} else {
+		size = fpu_buf_cfg.min_size;
+		dst_fpu->state = &dst_fpu->__default_state;
+		dst_fpu->state_mask = (xfeatures_mask_all & ~xfeatures_mask_user_dynamic);
+		clear_tsk_thread_flag(dst, TIF_FPU_PREALLOC);
+	}
 	/*
 	 * Don't let 'init optimized' areas of the XSAVE area
 	 * leak into the child task:
 	 */
-	memset(&dst_fpu->state->xsave, 0, fpu_buf_cfg.min_size);
+	memset(&dst_fpu->state->xsave, 0, size);
+
+	spin_lock_init(&dst_fpu->prealloc.lock);
+	dst_fpu->prealloc.state = NULL;
+	dst_fpu->prealloc.mask = 0;
 
 	/*
 	 * If the FPU registers are not owned by current just memcpy() the
@@ -307,7 +324,7 @@ int fpu_clone(struct task_struct *dst)
 	 */
 	fpregs_lock();
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
-		memcpy(dst_fpu->state, src_fpu->state, fpu_buf_cfg.min_size);
+		memcpy(dst_fpu->state, src_fpu->state, size);
 
 	else
 		save_fpregs_to_fpstate(dst_fpu);
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 14bfbf380015..3d1131bde10d 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -181,6 +181,9 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
 			goto out;
 		}
 
+		if (fpu->prealloc.state)
+			activate_xstate_prealloc_buffer(fpu);
+
 		/* Expand the xstate buffer based on the XSTATE_BV. */
 		ret = realloc_xstate_buffer(fpu, state_mask);
 		if (ret)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 75db4def5ec5..03dc576792a2 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1251,14 +1251,9 @@ void free_xstate_buffer(union fpregs_state *state)
 }
 
 /**
- * realloc_xstate_buffer - Re-alloc a buffer with the size calculated from
- *			   @mask.
+ * alloc_xstate_buffer - Allocate an xstate buffer with the given size
  *
- * @fpu:	A struct fpu * pointer
- * @mask:	The bitmap tells which components to be saved in the new
- *		buffer.
- *
- * It deals with enlarging the xstate buffer with dynamic states.
+ * @size:	The memory size to be allocated
  *
  * Use vzalloc() simply here. If the task with a vzalloc()-allocated buffer
  * tends to terminate quickly, vfree()-induced IPIs may be a concern.
@@ -1267,23 +1262,41 @@ void free_xstate_buffer(union fpregs_state *state)
  *
  * Also, this method does not shrink or reclaim the buffer.
  *
+ * Returns a pointer to the new buffer.
+ */
+union fpregs_state *alloc_xstate_buffer(unsigned int size)
+{
+	return vzalloc(size);
+}
+
+/**
+ * realloc_xstate_buffer - Re-allocate a buffer with the size calculated
+ *			   from @mask
+ *
+ * It deals with enlarging the xstate buffer with dynamic states.
+ *
+ * @fpu:	A struct fpu * pointer
+ * @mask:	The bitmap tells which components to be saved in the new
+ *		buffer.
  * Returns 0 on success, -ENOMEM on allocation error.
  */
 int realloc_xstate_buffer(struct fpu *fpu, u64 mask)
 {
 	union fpregs_state *state;
-	u64 state_mask;
+	unsigned int size;
 
-	state_mask = fpu->state_mask | mask;
-	if ((state_mask & fpu->state_mask) == state_mask)
+	if ((mask & fpu->state_mask) == mask)
 		return 0;
 
-	state = vzalloc(calculate_xstate_buf_size_from_mask(state_mask));
+	mask |= fpu->state_mask;
+	size = calculate_xstate_buf_size_from_mask(mask);
+
+	state = alloc_xstate_buffer(size);
 	if (!state)
 		return -ENOMEM;
 
 	if (cpu_feature_enabled(X86_FEATURE_XSAVES))
-		fpstate_init_xstate(&state->xsave, state_mask);
+		fpstate_init_xstate(&state->xsave, mask);
 
 	/* Free the old buffer */
 	if (fpu->state != &fpu->__default_state)
@@ -1295,10 +1308,81 @@ int realloc_xstate_buffer(struct fpu *fpu, u64 mask)
 	 * injection.
 	 */
 	fpu->state = state;
-	fpu->state_mask = state_mask;
+	fpu->state_mask = mask;
+	return 0;
+}
+
+/**
+ * prealloc_xstate_buffer - Pre-allocate a buffer with the size calculated
+ *			    from @mask
+ *
+ * The pre-allocation is recorded in @task->thread.fpu.prealloc and will be
+ * used later. The caller is expected to lock the fiels in struct
+ * fpu_prealloc.
+ *
+ * @task:	A struct fpu * pointer
+ * @mask:	The bitmap tells which components to be saved in the new
+ *		buffer.
+ * Returns 0 on success, -ENOMEM on allocation error.
+ */
+static int prealloc_xstate_buffer(struct task_struct *tsk, u64 mask)
+{
+	struct fpu *fpu = &tsk->thread.fpu;
+	union fpregs_state *state;
+	unsigned int size;
+
+	mask |= get_group_state_perm(tsk);
+	size = calculate_xstate_buf_size_from_mask(mask);
+
+	state = alloc_xstate_buffer(size);
+	if (!state)
+		return -ENOMEM;
+
+	if (cpu_feature_enabled(X86_FEATURE_XSAVES))
+		fpstate_init_xstate(&state->xsave, mask);
+
+	free_xstate_buffer(fpu->prealloc.state);
+	fpu->prealloc.state = state;
+	fpu->prealloc.mask = mask;
 	return 0;
 }
 
+/**
+ * activate_xstate_prealloc_buffer - Activate the preallocated xstate
+ *                                   buffer.
+ *
+ * @fpu:	A struct fpu * pointer
+ */
+void activate_xstate_prealloc_buffer(struct fpu *fpu)
+{
+	spin_lock_irq(&fpu->prealloc.lock);
+
+	if (fpu->state != &fpu->__default_state)
+		free_xstate_buffer(fpu->state);
+
+	fpu->state = fpu->prealloc.state;
+	fpu->state_mask = fpu->prealloc.mask;
+	fpu->prealloc.state = NULL;
+	fpu->prealloc.mask = 0;
+
+	spin_unlock_irq(&fpu->prealloc.lock);
+}
+
+/**
+ * free_xstate_prealloc_buffer - Free up the preallocated buffer
+ *
+ * The caller needs to make sure of locking state and mask fields of struct
+ * fpu_prealloc.
+ *
+ * @fpu:	A struct fpu * pointer
+ */
+static void free_xstate_prealloc_buffer(struct fpu *fpu)
+{
+	free_xstate_buffer(fpu->prealloc.state);
+	fpu->prealloc.state = NULL;
+	fpu->prealloc.mask = 0;
+}
+
 /**
  * get_features_mask_uabi - Get a feature list that the kernel supports
  * Return:	A bitmap that indicates which state the kernel enabled.
@@ -1328,20 +1412,26 @@ void reset_state_perm(struct task_struct *tsk)
 {
 	struct fpu *fpu = &tsk->thread.fpu;
 
+	WARN_ON(tsk->signal->nr_threads > 1);
+
 	fpu->state_perm = xfeatures_mask_all & ~xfeatures_mask_user_perm();
 
+	clear_tsk_thread_flag(tsk, TIF_FPU_PREALLOC);
+
 	if (!xfeatures_mask_user_dynamic ||
 	    !(fpu->state_mask & xfeatures_mask_user_dynamic))
 		return;
 
-	WARN_ON(tsk->signal->nr_threads > 1);
-
 	fpu->state_mask = (xfeatures_mask_all & ~xfeatures_mask_user_dynamic);
 	free_xstate_buffer(fpu->state);
 	fpu->state = &fpu->__default_state;
 	if (cpu_feature_enabled(X86_FEATURE_XSAVES))
 		fpstate_init_xstate(&fpu->state->xsave, fpu->state_mask);
 
+	spin_lock_irq(&fpu->prealloc.lock);
+	free_xstate_prealloc_buffer(fpu);
+	spin_unlock_irq(&fpu->prealloc.lock);
+
 	xfd_write((fpu->state_mask & xfeatures_mask_user_dynamic) ^
 		  xfeatures_mask_user_dynamic);
 }
@@ -1352,9 +1442,11 @@ void reset_state_perm(struct task_struct *tsk)
  * @option:	A subfunction of arch_prctl()
  * @arg2:	Either a pointer to userspace memory or state-component
  *		bitmap value.
+ * @arg3:	A sub-option for ARCH_SET_STATE_ENABLE
  * Return:	0 if successful; otherwise, return a relevant error code.
  */
-long do_arch_prctl_state(struct task_struct *tsk, int option, unsigned long arg2)
+long do_arch_prctl_state(struct task_struct *tsk, int option, unsigned long arg2,
+			 unsigned long arg3)
 {
 	u64 features_mask = get_features_mask_uabi();
 
@@ -1362,7 +1454,9 @@ long do_arch_prctl_state(struct task_struct *tsk, int option, unsigned long arg2
 	case ARCH_GET_FEATURES_WITH_KERNEL_ASSISTANCE:
 		return put_user(features_mask, (unsigned long __user *)arg2);
 	case ARCH_SET_STATE_ENABLE: {
+		struct task_struct *t;
 		u64 state_perm = arg2;
+		int err = 0;
 
 		if (state_perm & ~features_mask)
 			return -EINVAL;
@@ -1378,7 +1472,29 @@ long do_arch_prctl_state(struct task_struct *tsk, int option, unsigned long arg2
 		if (tsk->sas_ss_size > 0 && tsk->sas_ss_size < get_sigframe_size())
 			return -EPERM;
 
+		if (arg3 != ARCH_SET_STATE_ENABLE_ALLOC) {
+			tsk->group_leader->thread.fpu.state_perm |= state_perm;
+			return 0;
+		}
+
+		for_each_thread(tsk, t) {
+			spin_lock_irq(&t->thread.fpu.prealloc.lock);
+			if (!err)
+				err = prealloc_xstate_buffer(t, state_perm);
+		}
+
+		for_each_thread(tsk, t) {
+			if (err)
+				free_xstate_prealloc_buffer(&t->thread.fpu);
+			spin_unlock_irq(&t->thread.fpu.prealloc.lock);
+		}
+
+		if (err)
+			return -ENOMEM;
+
 		tsk->group_leader->thread.fpu.state_perm |= state_perm;
+		set_tsk_thread_flag(tsk->group_leader, TIF_FPU_PREALLOC);
+		activate_xstate_prealloc_buffer(&current->thread.fpu);
 		return 0;
 	}
 	case ARCH_GET_STATE_ENABLE: {
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index ae53ffd76882..9d07ec5f08f5 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -105,6 +105,7 @@ void arch_release_task_struct(struct task_struct *task)
 	/* Free up only the dynamically-allocated memory. */
 	if (task->thread.fpu.state != &task->thread.fpu.__default_state)
 		free_xstate_buffer(task->thread.fpu.state);
+	free_xstate_buffer(task->thread.fpu.prealloc.state);
 }
 
 /*
@@ -1016,7 +1017,7 @@ unsigned long get_wchan(struct task_struct *p)
 }
 
 long do_arch_prctl_common(struct task_struct *task, int option,
-			  unsigned long arg2)
+			  unsigned long arg2, unsigned long arg3)
 {
 	switch (option) {
 	case ARCH_GET_CPUID:
@@ -1026,7 +1027,7 @@ long do_arch_prctl_common(struct task_struct *task, int option,
 	case ARCH_GET_FEATURES_WITH_KERNEL_ASSISTANCE:
 	case ARCH_SET_STATE_ENABLE:
 	case ARCH_GET_STATE_ENABLE:
-		return do_arch_prctl_state(task, option, arg2);
+		return do_arch_prctl_state(task, option, arg2, arg3);
 	}
 
 	return -EINVAL;
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 7bd5d08eeb41..41f1410b37f8 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -221,7 +221,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	return prev_p;
 }
 
-SYSCALL_DEFINE2(arch_prctl, int, option, unsigned long, arg2)
+SYSCALL_DEFINE3(arch_prctl, int, option, unsigned long, arg2,
+		unsigned long, arg3)
 {
-	return do_arch_prctl_common(current, option, arg2);
+	return do_arch_prctl_common(current, option, arg2, arg3);
 }
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 7aceff54a818..76d23d9ac5c5 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -843,21 +843,23 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
 	return ret;
 }
 
-SYSCALL_DEFINE2(arch_prctl, int, option, unsigned long, arg2)
+SYSCALL_DEFINE3(arch_prctl, int, option, unsigned long, arg2,
+		unsigned long, arg3)
 {
 	long ret;
 
 	ret = do_arch_prctl_64(current, option, arg2);
 	if (ret == -EINVAL)
-		ret = do_arch_prctl_common(current, option, arg2);
+		ret = do_arch_prctl_common(current, option, arg2, arg3);
 
 	return ret;
 }
 
 #ifdef CONFIG_IA32_EMULATION
-COMPAT_SYSCALL_DEFINE2(arch_prctl, int, option, unsigned long, arg2)
+COMPAT_SYSCALL_DEFINE3(arch_prctl, int, option, unsigned long, arg2,
+		       unsigned long, arg3)
 {
-	return do_arch_prctl_common(current, option, arg2);
+	return do_arch_prctl_common(current, option, arg2, arg3);
 }
 #endif
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index cc19b570b322..b0b365084cec 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1145,6 +1145,9 @@ static __always_inline bool handle_xfd_event(struct fpu *fpu, struct pt_regs *re
 			if (!state_permitted(current, xfd_event)) {
 				force_sig_fault(SIGILL, ILL_ILLOPC, error_get_trap_addr(regs));
 			} else {
+				if (fpu->prealloc.state)
+					activate_xstate_prealloc_buffer(fpu);
+
 				if (!WARN_ON(in_interrupt())) {
 					err = realloc_xstate_buffer(fpu, xfd_event);
 					if (!err)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (28 preceding siblings ...)
  2021-10-01 22:37 ` [PATCH v11 29/29] x86/arch_prctl: ARCH_SET_STATE_ENABLE_ALLOC Chang S. Bae
@ 2021-10-01 22:47 ` Bae, Chang Seok
  2021-10-01 22:50 ` Bae, Chang Seok
  2021-10-02 21:54 ` Thomas Gleixner
  31 siblings, 0 replies; 45+ messages in thread
From: Bae, Chang Seok @ 2021-10-01 22:47 UTC (permalink / raw)
  To: tglx
  Cc: bp, Lutomirski, Andy, mingo, x86, Brown, Len, lenb, Hansen, Dave,
	Macieira, Thiago, Liu, Jing2, Shankar, Ravi V, linux-kernel

Hi Thomas,

Sending this version as it follows up the discussion [1] with some code changes from v10. This is not intended to ignore your comment on v10 at all. Appreciate your points on my oversights that I will address in v12 soon.

[1] https://lore.kernel.org/lkml/CAJvTdKkK=_pp1PrWdh1_GN73VifuAkivnErgK+bo2h34Vd_55w@mail.gmail.com/#t

Thanks,
Chang

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (29 preceding siblings ...)
  2021-10-01 22:47 ` [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Bae, Chang Seok
@ 2021-10-01 22:50 ` Bae, Chang Seok
  2021-10-03  1:05   ` Thomas Gleixner
  2021-10-02 21:54 ` Thomas Gleixner
  31 siblings, 1 reply; 45+ messages in thread
From: Bae, Chang Seok @ 2021-10-01 22:50 UTC (permalink / raw)
  To: tglx
  Cc: bp, Lutomirski, Andy, mingo, x86, Brown, Len, lenb, Hansen, Dave,
	Macieira, Thiago, Liu, Jing2, Shankar, Ravi V, linux-kernel

Hi Thomas,

Sending this version as it follows up the discussion [1] with some code
changes from v10. This is not intended to ignore your comment on v10 at all.
Appreciate your points on my oversights that I will address in v12 soon.

[1] https://lore.kernel.org/lkml/CAJvTdKkK=_pp1PrWdh1_GN73VifuAkivnErgK+bo2h34Vd_55w@mail.gmail.com/#t

Thanks,
Chang

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions
  2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
                   ` (30 preceding siblings ...)
  2021-10-01 22:50 ` Bae, Chang Seok
@ 2021-10-02 21:54 ` Thomas Gleixner
  2021-10-02 22:11   ` Bae, Chang Seok
  2021-10-02 22:20   ` Bae, Chang Seok
  31 siblings, 2 replies; 45+ messages in thread
From: Thomas Gleixner @ 2021-10-02 21:54 UTC (permalink / raw)
  To: Chang S. Bae, bp, luto, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

Chang,

On Fri, Oct 01 2021 at 15:36, Chang S. Bae wrote:
> The patches are built on top of the recent upstream x86 FPU changes [13].

which does not apply on:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master

because the relentless x86 folks changed the FPU code some more...

You should know the drill by now.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions
  2021-10-02 21:54 ` Thomas Gleixner
@ 2021-10-02 22:11   ` Bae, Chang Seok
  2021-10-04 13:44     ` Thomas Gleixner
  2021-10-02 22:20   ` Bae, Chang Seok
  1 sibling, 1 reply; 45+ messages in thread
From: Bae, Chang Seok @ 2021-10-02 22:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: bp, Lutomirski, Andy, mingo, x86, Brown, Len, lenb, Hansen, Dave,
	Macieira, Thiago, Liu, Jing2, Shankar, Ravi V, linux-kernel

On Oct 2, 2021, at 14:54, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, Oct 01 2021 at 15:36, Chang S. Bae wrote:
>> The patches are built on top of the recent upstream x86 FPU changes [13].
> 
> which does not apply on:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master
> 
> because the relentless x86 folks changed the FPU code some more...
> 
> You should know the drill by now.

Oh, I’m sorry, that sentence was copied from the old cover letters.

I should have fixed that by saying it is no top of the mainline 5.15-rc3 as
shown on the bottom:

> base-commit: 5816b3e6577eaa676ceb00a848f0fd65fe2adc29

Thanks,
Chang

P.S. I will reply to your comments on v10 shortly.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions
  2021-10-02 21:54 ` Thomas Gleixner
  2021-10-02 22:11   ` Bae, Chang Seok
@ 2021-10-02 22:20   ` Bae, Chang Seok
  1 sibling, 0 replies; 45+ messages in thread
From: Bae, Chang Seok @ 2021-10-02 22:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: bp, Lutomirski, Andy, mingo, x86, Brown, Len, lenb, Hansen, Dave,
	Macieira, Thiago, Liu, Jing2, Shankar, Ravi V, linux-kernel

On Oct 2, 2021, at 14:54, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, Oct 01 2021 at 15:36, Chang S. Bae wrote:
>> The patches are built on top of the recent upstream x86 FPU changes [13].
> 
> which does not apply on:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master
> 
> because the relentless x86 folks changed the FPU code some more...
> 
> You should know the drill by now. 

Oh, I’m sorry, that sentence was copied from the old cover letters.

I should have fixed that by saying it is no top of the mainline 5.15-rc3 as
shown on the bottom:

> base-commit: 5816b3e6577eaa676ceb00a848f0fd65fe2adc29

Thanks,
Chang

PS. I will reply to your comments on v10 shortly.
PPS. My earlier mail seems to go wrong, sorry.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions
  2021-10-01 22:50 ` Bae, Chang Seok
@ 2021-10-03  1:05   ` Thomas Gleixner
  2021-10-04 14:48     ` Bae, Chang Seok
  0 siblings, 1 reply; 45+ messages in thread
From: Thomas Gleixner @ 2021-10-03  1:05 UTC (permalink / raw)
  To: Brown, Len
  Cc: Bae, Chang Seok, bp, Lutomirski, Andy, mingo, x86, lenb, Hansen,
	Dave, Macieira, Thiago, Liu, Jing2, Shankar, Ravi V,
	linux-kernel

Len,

On Fri, Oct 01 2021 at 22:50, Chang Seok Bae wrote:
> Sending this version as it follows up the discussion [1] with some code
> changes from v10. This is not intended to ignore your comment on v10 at all.
> Appreciate your points on my oversights that I will address in v12 soon.

why on earth did you make Chang send these patches out when there are
fundamental review comments on the fly vs. the previous version?

The changes vs. V10 just try to address the recently discussed updates
to the permission interface, which we agreed on a couple of days ago,
but all of that still is built on top of something which has serious
flaws at all ends.

Rushing stuff out just to get an internal checkbox ticked off might be
interesting for Intel internal managerial reasons about which I don't
care at all.

But I very much care about the noise in my inbox and the time I have to
spend^W waste on this. Aside of that I care about how this shifts the
blame to someone else. See below. 

Looking at the delta between v10 and v11, it's entirely clear that this
is just a hastily cobbled together update which hasn't even seen any
reasonable scrunity. Just looking at this gem:

> +long do_arch_prctl_state(struct task_struct *tsk, int option, unsigned long arg2,

...

> +	case ARCH_GET_FEATURES_WITH_KERNEL_ASSISTANCE:
> +		return put_user(features_mask, (unsigned long __user *)arg2);
> +	case ARCH_SET_STATE_ENABLE: {

> +		if (arg3 != ARCH_SET_STATE_ENABLE_ALLOC) {

A suboption in the same namespace? Why? What's wrong with having
explicit options to make the user space interface sensible?

> +			tsk->group_leader->thread.fpu.state_perm |= state_perm;

Accessing tsk->group_leaader is safe here because?

> +			return 0;
> +		}
> +
> +		for_each_thread(tsk, t) {

Walking the threads is safe here because?

> +			spin_lock_irq(&t->thread.fpu.prealloc.lock);

Nesting the lock acquisitions for potentially hundreds of threads is
correct in which way?

Also what is disabling interrupts protecting against?

If nesting these locks would be sensible in any way, then how is _irq()
the correct mechanism to use?

> +			if (!err)
> +				err = prealloc_xstate_buffer(t, state_perm);

Surely continuing the list walk when an error happened is a brilliant
idea.

Aside of that this still calls into vzalloc() with interrupts
disabled which is wrong to begin with.

> +		}
> +
> +		for_each_thread(tsk, t) {
> +			if (err)
> +				free_xstate_prealloc_buffer(&t->thread.fpu);
> +			spin_unlock_irq(&t->thread.fpu.prealloc.lock);

If this ever gets invoked from a process which has threads then the
unconditional enabling of interrupts here is protecting the rest of the
threads against interrupts in which way?

Not that the interrupt disable above is protecting anything, but that's
just beyond hillarious, really.

> +		}
> +
> +		if (err)
> +			return -ENOMEM;
> +
> +		tsk->group_leader->thread.fpu.state_perm |= state_perm;
> +		set_tsk_thread_flag(tsk->group_leader, TIF_FPU_PREALLOC);

Again, accessing tsk->group_leaader is still safe here because?

> +		return 0;

IOW a total of at least 5 obvious bugs in 60 lines of code (including
new lines and comment). That's an achievement.

Len, it's sad that an experienced kernel developer like you is sticking
a Reviewed-by tag on something like that. Either you forgot anything
about kernel development since you became a manager or you simply do not
care anymore and just want to tick your checkboxes. Neither one of these
options is acceptable.

What's even worse is that you made Chang to send that out, instead of
guiding him with your experience to do the right thing. IOW, you make
Chang look bad instead of helping him. Big coporate culture I assume,
but that does not justify that in any way.

Yours seriously grumpy

     Thomas

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions
  2021-10-02 22:11   ` Bae, Chang Seok
@ 2021-10-04 13:44     ` Thomas Gleixner
  2021-10-04 14:47       ` Bae, Chang Seok
  0 siblings, 1 reply; 45+ messages in thread
From: Thomas Gleixner @ 2021-10-04 13:44 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: bp, Lutomirski, Andy, mingo, x86, Brown, Len, lenb, Hansen, Dave,
	Macieira, Thiago, Liu, Jing2, Shankar, Ravi V, linux-kernel

On Sat, Oct 02 2021 at 22:11, Chang Seok Bae wrote:
> On Oct 2, 2021, at 14:54, Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Fri, Oct 01 2021 at 15:36, Chang S. Bae wrote:
>>> The patches are built on top of the recent upstream x86 FPU changes [13].
>> 
>> which does not apply on:
>> 
>>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master
>> 
>> because the relentless x86 folks changed the FPU code some more...
>> 
>> You should know the drill by now.
>
> Oh, I’m sorry, that sentence was copied from the old cover letters.
>
> I should have fixed that by saying it is no top of the mainline 5.15-rc3 as
> shown on the bottom:
>
>> base-commit: 5816b3e6577eaa676ceb00a848f0fd65fe2adc29

I know what a base commit is, but this still does not make it apply on
the tip tree which has already 10 patches against the FPU code applied
in the x86/fpu branch for 5.16. And that's the reference tree not some
arbitrary chosen base commit.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions
  2021-10-04 13:44     ` Thomas Gleixner
@ 2021-10-04 14:47       ` Bae, Chang Seok
  0 siblings, 0 replies; 45+ messages in thread
From: Bae, Chang Seok @ 2021-10-04 14:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: bp, Lutomirski, Andy, mingo, x86, Brown, Len, lenb, Hansen, Dave,
	Macieira, Thiago, Liu, Jing2, Shankar, Ravi V, linux-kernel

On Oct 4, 2021, at 06:44, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> 
> I know what a base commit is, but this still does not make it apply on
> the tip tree which has already 10 patches against the FPU code applied
> in the x86/fpu branch for 5.16. And that's the reference tree not some
> arbitrary chosen base commit.

Sorry, will always make sure the series based on top of the tip tree.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions
  2021-10-03  1:05   ` Thomas Gleixner
@ 2021-10-04 14:48     ` Bae, Chang Seok
  0 siblings, 0 replies; 45+ messages in thread
From: Bae, Chang Seok @ 2021-10-04 14:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Brown, Len, bp, Lutomirski, Andy, mingo, x86, lenb, Hansen, Dave,
	Macieira, Thiago, Liu, Jing2, Shankar, Ravi V, linux-kernel

On Oct 2, 2021, at 18:05, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, Oct 01 2021 at 22:50, Chang Seok Bae wrote:
>> Sending this version as it follows up the discussion [1] with some code
>> changes from v10. This is not intended to ignore your comment on v10 at all.
>> Appreciate your points on my oversights that I will address in v12 soon.
> 
> why on earth did you make Chang send these patches out when there are
> fundamental review comments on the fly vs. the previous version?

My apologies. I regret sending v11 as fundamental rework is going on, that you
shared in the other mail [1].

Also, perhaps it might be a better way to separate the last patch in
particular. Soak out with more eyes on it, then send it as RFC at best IMO.

I would think right now to focus on helping your rework and addressing what
you pointed out on v10 patches. I will revisit the last patch later.

Thanks,
Chang

[1] https://lore.kernel.org/lkml/878rz9gdbb.ffs@tglx/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v11 15/29] x86/arch_prctl: Create ARCH_SET_STATE_ENABLE/ARCH_GET_STATE_ENABLE
  2021-10-01 22:37 ` [PATCH v11 15/29] x86/arch_prctl: Create ARCH_SET_STATE_ENABLE/ARCH_GET_STATE_ENABLE Chang S. Bae
@ 2021-10-05  0:30   ` Thomas Gleixner
  2021-10-05  9:49     ` Thomas Gleixner
  0 siblings, 1 reply; 45+ messages in thread
From: Thomas Gleixner @ 2021-10-05  0:30 UTC (permalink / raw)
  To: Chang S. Bae, bp, luto, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

On Fri, Oct 01 2021 at 15:37, Chang S. Bae wrote:
> arch_prctl(ARCH_SET_STATE_ENABLE, u64 bitmask)
>     Some XSTATE features, such as AMX, are unavailable to applications
>     until that process explicitly requests them via this call. Requests can
>     be made for any number of valid user XSTATEs in a single call. This
>     call is intended to be invoked very early in process initialization. A
>     forked child inherits access, but permission is reset upon exec. There
>     is no concept of un-requesting XSTATE access.
>     Return codes:
>         0: success (including repeated calls)
>         EINVAL: no hardware feature for the request
>
> arch_prctl(ARCH_GET_STATE_ENABLE, u64 *bitmask)
>     Return the bitmask of permitted user XSTATE features. If XSAVE is
>     disabled, the bitmask indicates only legacy states.
>
> The permission is checked at every XSTATE buffer expansion: e.g.
> XFD-induced #NM event, and ptracer's XSTATE injection. When no permission
> is found, inform userspace via SIGILL or with error code.
>
> For "dynamic" XSTATE features that have XFD hardware support, the kernel
> can enforce that users can not touch state without permission. For state
> that has no XFD support, the kernel can not prevent a user from touching
> that state.
>
> The notion of granted permission is recorded in the group leader only. A
> new task copies its permission bitmask.

while this patch does again way more than the subject suggests and
really should be split into bits and pieces, the prctl is ill defined
and the implementation is partially buggy.

> +	/*
> +	 * @state_perm:
> +	 *
> +	 * This bitmap indicates the permission for state components.
> +	 *
> +	 * Always reference group_leader's value via
> +	 * get_group_state_perm() as it readily represents the process's
> +	 * state permission.
> +	 */
> +	u64				state_perm;

This want's to be __state_perm to denote that this should not be
accessed directly.

Also the only reason to access this is when a task triggers a permission
check vs. it's own permissions and not unconditionally be deferenced all
over the place.

The point is that you don't want to derefence tsk->group_leader if not
absolutely required. That's why I want to have the information in
fpstate which is thread local and has to be accessed anyway.

Only if tsk->fpstate does not provide the permission then the group
leader state has to be checked, i.e. in #NM and ptrace which is a slow
path anyway. In the case that the permission check on the thread local
info fails then the task is either going to be killed or an extended
buffer has to be allocated.

See?

> +/* Require ARCH_SET_STATE_ENABLE for future features  */
> +#define XFEATURE_MASK_PERMISSION_REQUIRED GENMASK_ULL(63, XFEATURE_MAX)

When you add AMX then XFEATURE_MAX is going to be past AMX. And no, even
if you fix that up later in weird ways, this does not make sense.

> +/**
> + * get_group_state_perm - Get a per-process state permission
> + * @tsk:	A struct task_struct * pointer
> + * Return:	A bitmap to indicate state permission.
> + */
> +static inline u64 get_group_state_perm(struct task_struct *tsk)
> +{
> +	return tsk->group_leader->thread.fpu.state_perm;

This needs a READ_ONCE() because it can be concurrently modified and the
read is lockless. Which means that the write side needs a WRITE_ONCE(),
but see below.

> +}
> +
> +/**
> + * state_permitted - Check a task's permission for indicated features.

state_permitted and all the other state names are way too broad. This is
about xstate and not about random states. We have name spaces for a
reason.

> +#define ARCH_SET_STATE_ENABLE	0x1021
> +#define ARCH_GET_STATE_ENABLE	0x1022

This is about XSTATE components and should be named as such,
i.e. something which is entirely clear, e.g. _XCOMP_ because that is
what this is about. More below.

Aside of that why does this not start with the obvious and simple prctl
option to retrieve the possible supported features?

> +/**
> + * reset_state_perm - Reset a task's permission for dynamic user state
> + *
> + * It is expected to call at exec in which one task runs in a process.
> + *
> + * @task:	A struct task_struct * pointer
> + */
> +void reset_state_perm(struct task_struct *tsk)
> +{
> +	struct fpu *fpu = &tsk->thread.fpu;
> +
> +	fpu->state_perm = xfeatures_mask_all & ~xfeatures_mask_user_perm();
> +
> +	if (!xfeatures_mask_user_dynamic ||
> +	    !(fpu->state_mask & xfeatures_mask_user_dynamic))
> +		return;
> +
> +	WARN_ON(tsk->signal->nr_threads > 1);

Why? The only two callers are from set_personality*().

Aside of that why are you doing this from set_personality()?

This has absolutely nothing to do with set_personality() which is
exclusively about native and compat format of the executable.

At the point where set_personality() is invoked the task which does
exec() has already invoked flush_thread(), which invokes
fpu_flush_thread() which in turn resets the FPU state...

So what?

> +/**
> + * do_arch_prctl_state - Read or write the state permission.
> + * @fpu:	A struct task_struct * pointer

fpu != tsk

Also yuck, that task argument is silly because task cannot be anything
else than current, but that's not your fault.

> + * @option:	A subfunction of arch_prctl()
> + * @arg2:	Either a pointer to userspace memory or state-component
> + *		bitmap value.
> + * Return:	0 if successful; otherwise, return a relevant error code.
> + */
> +long do_arch_prctl_state(struct task_struct *tsk, int option, unsigned long arg2)
> +{
> +	u64 features_mask;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_FPU))
> +		features_mask = 0;
> +	else if (use_fxsr())
> +		features_mask = XFEATURE_MASK_FPSSE;
> +	else
> +		features_mask = XFEATURE_MASK_FP;

Why? This feature mask should be evaluated once and not over and over again.

> +	switch (option) {
> +	case ARCH_SET_STATE_ENABLE: {
> +		u64 state_perm = arg2;

This does nowhere mention in the comments above that this limits the
available state space to 32bit for 32bit tasks running on a 64 bit
kernel. I don't think it's a problem, but it has to be documented.

> +
> +		if (use_xsave())
> +			features_mask = xfeatures_mask_uabi();
> +
> +		if (state_perm & ~features_mask)
> +			return -EINVAL;
> +
> +		state_perm &= xfeatures_mask_user_perm();
> +		if (!state_perm)
> +			return 0;

I really do not get the semantics of this prctl at all.

GET stores _all_ permitted bits in the user space variable which makes
sense.

SET is just accepting everything except not supported bits, but as it
takes a feature bitmask it suggests that this sets the xfeature bits
which are available for the task or the process.

How does prctl(..., SET, 0) make sense?

It does not make any sense at all. There is no support for downgrading
the permitted features:

    1) Default features up to AVX512 cannot be disabled
    
    2) Once AMX (or any upcoming state) is enabled there is not way back.

So no. This really want's to be

   prctl(SET, xfeature_number)

and not something which is semanticaly ill defined.

> +		tsk->group_leader->thread.fpu.state_perm |= state_perm;

While this "works" with a single permission controlled state this is
completely broken once more permission controlled states come into play
when tasks of the same process invoke this concurrently and request
different features.

> +		return 0;
> +	}
> +	case ARCH_GET_STATE_ENABLE: {
> +		if (use_xsave())
> +			features_mask = get_group_state_perm(tsk);
> +
> +		return put_user(features_mask, (unsigned long __user *)arg2);

This is broken for 32bit kernels. The prctl is unconditional and
therefore this needs to be a *u64 cast.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v11 15/29] x86/arch_prctl: Create ARCH_SET_STATE_ENABLE/ARCH_GET_STATE_ENABLE
  2021-10-05  0:30   ` Thomas Gleixner
@ 2021-10-05  9:49     ` Thomas Gleixner
  2021-10-05 11:23       ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Thomas Gleixner @ 2021-10-05  9:49 UTC (permalink / raw)
  To: Chang S. Bae, bp, luto, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

On Tue, Oct 05 2021 at 02:30, Thomas Gleixner wrote:
> On Fri, Oct 01 2021 at 15:37, Chang S. Bae wrote:
>> +		if (state_perm & ~features_mask)
>> +			return -EINVAL;
>> +
>> +		state_perm &= xfeatures_mask_user_perm();
>> +		if (!state_perm)
>> +			return 0;
>
> I really do not get the semantics of this prctl at all.
>
> GET stores _all_ permitted bits in the user space variable which makes
> sense.
>
> SET is just accepting everything except not supported bits, but as it
> takes a feature bitmask it suggests that this sets the xfeature bits
> which are available for the task or the process.
>
> How does prctl(..., SET, 0) make sense?
>
> It does not make any sense at all. There is no support for downgrading
> the permitted features:
>
>     1) Default features up to AVX512 cannot be disabled
>     
>     2) Once AMX (or any upcoming state) is enabled there is not way back.
>
> So no. This really want's to be
>
>    prctl(SET, xfeature_number)
>
> and not something which is semanticaly ill defined.

So of course this is odd at all ends because AMX requires two feature
bits to be enabled (17 and 18).

Now with the above bitmap based thing prctl(SET, 1 << 17) is valid
because it's supported and of course there is no sanity check at all.

With a feature number based interface it's even worse. Duh, should have
thought about that.

So this gives us two options:

   1) Bitmap with proper sanity checks

      reject (1 << 17) and (1 << 18)
      grant (1 << 17 | 1 << 18)

      but for sanity sake and also for ease of filtering, we want to
      restrict a permission request to one functional block at a time.

      #define X86_XCOMP_AMX	(1 << 17 | 1 << 18)
      #define X86_XCOMP_XYZ1    (1 << 19)

      But that gets a bit odd when there is a component which depends on
      others:

      #define X86_XCOMP_XYZ2    (1 << 19 | 1 << 20)

   2) Facility based numerical interface, i.e.

      #define X86_XCOMP_AMX	1
      #define X86_XCOMP_XYZ1    2
      #define X86_XCOMP_XYZ2    3

      is way simpler to understand IMO.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v11 15/29] x86/arch_prctl: Create ARCH_SET_STATE_ENABLE/ARCH_GET_STATE_ENABLE
  2021-10-05  9:49     ` Thomas Gleixner
@ 2021-10-05 11:23       ` Peter Zijlstra
  2021-10-05 12:27         ` Thomas Gleixner
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2021-10-05 11:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Chang S. Bae, bp, luto, mingo, x86, len.brown, lenb, dave.hansen,
	thiago.macieira, jing2.liu, ravi.v.shankar, linux-kernel

On Tue, Oct 05, 2021 at 11:49:05AM +0200, Thomas Gleixner wrote:
> So this gives us two options:
> 
>    1) Bitmap with proper sanity checks
> 
>       reject (1 << 17) and (1 << 18)
>       grant (1 << 17 | 1 << 18)
> 
>       but for sanity sake and also for ease of filtering, we want to
>       restrict a permission request to one functional block at a time.
> 
>       #define X86_XCOMP_AMX	(1 << 17 | 1 << 18)
>       #define X86_XCOMP_XYZ1    (1 << 19)
> 
>       But that gets a bit odd when there is a component which depends on
>       others:
> 
>       #define X86_XCOMP_XYZ2    (1 << 19 | 1 << 20)
> 
>    2) Facility based numerical interface, i.e.
> 
>       #define X86_XCOMP_AMX	1
>       #define X86_XCOMP_XYZ1    2
>       #define X86_XCOMP_XYZ2    3
> 
>       is way simpler to understand IMO.

I'm thinking 2 makes most sense. Perhaps we could use the highest
feature number involved in the facility to denote it? The rationale
being that we don't have to invent yet another enumeration and it's
easier to figure out what's what.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v11 15/29] x86/arch_prctl: Create ARCH_SET_STATE_ENABLE/ARCH_GET_STATE_ENABLE
  2021-10-05 11:23       ` Peter Zijlstra
@ 2021-10-05 12:27         ` Thomas Gleixner
  0 siblings, 0 replies; 45+ messages in thread
From: Thomas Gleixner @ 2021-10-05 12:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Chang S. Bae, bp, luto, mingo, x86, len.brown, lenb, dave.hansen,
	thiago.macieira, jing2.liu, ravi.v.shankar, linux-kernel

On Tue, Oct 05 2021 at 13:23, Peter Zijlstra wrote:
> On Tue, Oct 05, 2021 at 11:49:05AM +0200, Thomas Gleixner wrote:
>> So this gives us two options:
>> 
>>    1) Bitmap with proper sanity checks
>> 
>>       reject (1 << 17) and (1 << 18)
>>       grant (1 << 17 | 1 << 18)
>> 
>>       but for sanity sake and also for ease of filtering, we want to
>>       restrict a permission request to one functional block at a time.
>> 
>>       #define X86_XCOMP_AMX	(1 << 17 | 1 << 18)
>>       #define X86_XCOMP_XYZ1    (1 << 19)
>> 
>>       But that gets a bit odd when there is a component which depends on
>>       others:
>> 
>>       #define X86_XCOMP_XYZ2    (1 << 19 | 1 << 20)
>> 
>>    2) Facility based numerical interface, i.e.
>> 
>>       #define X86_XCOMP_AMX	1
>>       #define X86_XCOMP_XYZ1    2
>>       #define X86_XCOMP_XYZ2    3
>> 
>>       is way simpler to understand IMO.
>
> I'm thinking 2 makes most sense. Perhaps we could use the highest
> feature number involved in the facility to denote it? The rationale
> being that we don't have to invent yet another enumeration and it's
> easier to figure out what's what.

That makes sense. So the above would be:

      #define X86_XCOMP_AMX	18      (implies 17)
      #define X86_XCOMP_XYZ1    19
      #define X86_XCOMP_XYZ2    20      (implies 19)

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v11 16/29] x86/fpu/xstate: Support both legacy and expanded signal XSTATE size
  2021-10-01 22:37 ` [PATCH v11 16/29] x86/fpu/xstate: Support both legacy and expanded signal XSTATE size Chang S. Bae
@ 2021-10-05 12:30   ` Thomas Gleixner
  2021-10-05 15:19   ` Thomas Gleixner
  1 sibling, 0 replies; 45+ messages in thread
From: Thomas Gleixner @ 2021-10-05 12:30 UTC (permalink / raw)
  To: Chang S. Bae, bp, luto, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

On Fri, Oct 01 2021 at 15:37, Chang S. Bae wrote:
>  static int
>  do_sigaltstack (const stack_t *ss, stack_t *oss, unsigned long sp,
>  		size_t min_ss_size)
> @@ -4187,6 +4192,9 @@ do_sigaltstack (const stack_t *ss, stack_t *oss, unsigned long sp,
>  				return -ENOMEM;
>  		}
>  
> +		if (!arch_enough_sigaltstack(t, ss_size))
> +			return -EINVAL;
> +

How is this even remotely correct?

sigaltstack(2):

  To disable an existing stack, specify ss.ss_flags as SS_DISABLE.  In
  this case, the kernel ignores any other flags in ss.ss_flags and the
  remaining fields in ss.

So for SS_DISABLE ss_size can be legitimately 0 or any other random
number smaller than the required stack.

The min_ss_minsize check is conditional for this very reason.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v11 16/29] x86/fpu/xstate: Support both legacy and expanded signal XSTATE size
  2021-10-01 22:37 ` [PATCH v11 16/29] x86/fpu/xstate: Support both legacy and expanded signal XSTATE size Chang S. Bae
  2021-10-05 12:30   ` Thomas Gleixner
@ 2021-10-05 15:19   ` Thomas Gleixner
  1 sibling, 0 replies; 45+ messages in thread
From: Thomas Gleixner @ 2021-10-05 15:19 UTC (permalink / raw)
  To: Chang S. Bae, bp, luto, mingo, x86
  Cc: len.brown, lenb, dave.hansen, thiago.macieira, jing2.liu,
	ravi.v.shankar, linux-kernel, chang.seok.bae

On Fri, Oct 01 2021 at 15:37, Chang S. Bae wrote:
> @@ -1252,6 +1267,13 @@ long do_arch_prctl_state(struct task_struct *tsk, int option, unsigned long arg2
>  		if (!state_perm)
>  			return 0;
>  
> +		/*
> +		 * Disallow when sigaltstack is not enough for the
> +		 * AT_MINSIGSTKSZ value.
> +		 */
> +		if (tsk->sas_ss_size > 0 && tsk->sas_ss_size < get_sigframe_size())
> +			return -EPERM;

This is not enough:

T1
sigaltstack(minsize)
...
                            T2
                            libinit()
                            prctl(....) --> success
                            enable_amx()

libfunc()         
  if (amx_enabled())
       AMXINSN
       -->#NM --> success

handle_signal()
   die(because altstack too small);

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2021-10-05 15:19 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-01 22:36 [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 01/29] x86/fpu/xstate: Fix the state copy function to the XSTATE buffer Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 02/29] x86/fpu/xstate: Modify the initialization helper to handle both static and dynamic buffers Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 03/29] x86/fpu/xstate: Modify state copy helpers " Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 04/29] x86/fpu/xstate: Modify address finders " Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 05/29] x86/fpu/xstate: Add a new variable to indicate dynamic user states Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 06/29] x86/fpu/xstate: Add new variables to indicate dynamic XSTATE buffer size Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 07/29] x86/fpu/xstate: Calculate and remember dynamic XSTATE buffer sizes Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 08/29] x86/fpu/xstate: Convert the struct fpu 'state' field to a pointer Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 09/29] x86/fpu/xstate: Introduce helpers to manage the XSTATE buffer dynamically Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 10/29] x86/fpu/xstate: Update the XSTATE save function to support dynamic states Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 11/29] x86/fpu/xstate: Update the XSTATE buffer address finder " Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 12/29] x86/fpu/xstate: Update the XSTATE context copy function " Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 13/29] x86/fpu/xstate: Use feature disable (XFD) to protect dynamic user state Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 14/29] x86/fpu/xstate: Support ptracer-induced XSTATE buffer expansion Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 15/29] x86/arch_prctl: Create ARCH_SET_STATE_ENABLE/ARCH_GET_STATE_ENABLE Chang S. Bae
2021-10-05  0:30   ` Thomas Gleixner
2021-10-05  9:49     ` Thomas Gleixner
2021-10-05 11:23       ` Peter Zijlstra
2021-10-05 12:27         ` Thomas Gleixner
2021-10-01 22:37 ` [PATCH v11 16/29] x86/fpu/xstate: Support both legacy and expanded signal XSTATE size Chang S. Bae
2021-10-05 12:30   ` Thomas Gleixner
2021-10-05 15:19   ` Thomas Gleixner
2021-10-01 22:37 ` [PATCH v11 17/29] x86/fpu/xstate: Adjust the XSAVE feature table to address gaps in state component numbers Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 18/29] x86/fpu/xstate: Disable XSTATE support if an inconsistent state is detected Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 19/29] x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature bits Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 20/29] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 21/29] x86/fpu/amx: Initialize child's AMX state Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 22/29] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 23/29] x86/fpu/xstate: Skip writing zeros to signal frame for dynamic user states if in INIT-state Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 24/29] selftest/x86/amx: Test cases for the AMX state management Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 25/29] x86/insn/amx: Add TILERELEASE instruction to the opcode map Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 26/29] intel_idle/amx: Add SPR support with XTILEDATA capability Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 27/29] x86/fpu/xstate: Add a sanity check for XFD state when saving XSTATE Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 28/29] x86/arch_prctl: ARCH_GET_FEATURES_WITH_KERNEL_ASSISTANCE Chang S. Bae
2021-10-01 22:37 ` [PATCH v11 29/29] x86/arch_prctl: ARCH_SET_STATE_ENABLE_ALLOC Chang S. Bae
2021-10-01 22:47 ` [PATCH v11 00/29] x86: Support Intel Advanced Matrix Extensions Bae, Chang Seok
2021-10-01 22:50 ` Bae, Chang Seok
2021-10-03  1:05   ` Thomas Gleixner
2021-10-04 14:48     ` Bae, Chang Seok
2021-10-02 21:54 ` Thomas Gleixner
2021-10-02 22:11   ` Bae, Chang Seok
2021-10-04 13:44     ` Thomas Gleixner
2021-10-04 14:47       ` Bae, Chang Seok
2021-10-02 22:20   ` Bae, Chang Seok

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).