LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic
@ 2021-10-26 12:01 Peter Zijlstra
  2021-10-26 12:01 ` [PATCH v3 01/16] objtool: Classify symbols Peter Zijlstra
                   ` (19 more replies)
  0 siblings, 20 replies; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-26 12:01 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers, bpf

Hi,

These patches rewrite the way retpolines are rewritten. Currently objtool emits
alternative entries for most retpoline calls. However trying to extend that led
to trouble (ELF files are horrid).

Therefore completely overhaul this and have objtool emit a .retpoline_sites
section that lists all compiler generated retpoline thunk calls. Then the
kernel can do with them as it pleases.

Notably it will:

 - rewrite them to indirect instructions for !RETPOLINE
 - rewrite them to lfence; indirect; for RETPOLINE_AMD,
   where size allows (boo clang!)

Specifically, the !RETPOLINE case can now also deal with the clang-special
conditional-indirect-tail-call:

  Jcc __x86_indirect_thunk_\reg.

Finally, also update the x86 BPF jit to catch up to recent times and do these
same things.

All this should help improve performance by removing an indirection.

Patches can (soon) be found here:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git objtool/core

Changes since v2:

 - rewrite the __x86_indirect_thunk_array[] stuff again
 - rewrite the retpoline,amd rewrite logic, it now also supports
   rewriting the Jcc case, if the original instruction is long enough, but
   more importantly, it's simpler code.
 - bpf label simplification patch
 - random assorted cleanups
 - actually managed to get bpf selftests working

---
 arch/um/kernel/um_arch.c                |   4 +
 arch/x86/include/asm/GEN-for-each-reg.h |  14 ++-
 arch/x86/include/asm/alternative.h      |   1 +
 arch/x86/include/asm/asm-prototypes.h   |  18 ---
 arch/x86/include/asm/nospec-branch.h    |  72 ++---------
 arch/x86/kernel/alternative.c           | 189 ++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/bugs.c              |   7 --
 arch/x86/kernel/module.c                |   9 +-
 arch/x86/kernel/vmlinux.lds.S           |  14 +++
 arch/x86/lib/retpoline.S                |  56 ++-------
 arch/x86/net/bpf_jit_comp.c             | 160 +++++++++---------------
 arch/x86/net/bpf_jit_comp32.c           |  22 +++-
 tools/objtool/arch/x86/decode.c         | 120 ------------------
 tools/objtool/check.c                   | 208 ++++++++++++++++++++++----------
 tools/objtool/elf.c                     |  84 -------------
 tools/objtool/include/objtool/check.h   |   1 -
 tools/objtool/include/objtool/elf.h     |   6 +-
 tools/objtool/special.c                 |   8 --
 18 files changed, 472 insertions(+), 521 deletions(-)


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 01/16] objtool: Classify symbols
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
@ 2021-10-26 12:01 ` Peter Zijlstra
  2021-10-28  3:58   ` Josh Poimboeuf
  2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
  2021-10-26 12:01 ` [PATCH v3 02/16] objtool: Explicitly avoid self modifying code in .altinstr_replacement Peter Zijlstra
                   ` (18 subsequent siblings)
  19 siblings, 2 replies; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-26 12:01 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers, bpf

In order to avoid calling str*cmp() on symbol names, over and over, do
them all once upfront and store the result.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/check.c               |   34 ++++++++++++++++++++++------------
 tools/objtool/include/objtool/elf.h |    5 ++++-
 2 files changed, 26 insertions(+), 13 deletions(-)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1012,8 +1012,7 @@ static void add_call_dest(struct objtool
 	 * so they need a little help, NOP out any KCOV calls from noinstr
 	 * text.
 	 */
-	if (insn->sec->noinstr &&
-	    !strncmp(insn->call_dest->name, "__sanitizer_cov_", 16)) {
+	if (insn->sec->noinstr && insn->call_dest->kcov) {
 		if (reloc) {
 			reloc->type = R_NONE;
 			elf_write_reloc(file->elf, reloc);
@@ -1027,7 +1026,7 @@ static void add_call_dest(struct objtool
 		insn->type = sibling ? INSN_RETURN : INSN_NOP;
 	}
 
-	if (mcount && !strcmp(insn->call_dest->name, "__fentry__")) {
+	if (mcount && insn->call_dest->fentry) {
 		if (sibling)
 			WARN_FUNC("Tail call to __fentry__ !?!?", insn->sec, insn->offset);
 
@@ -1077,7 +1076,7 @@ static int add_jump_destinations(struct
 		} else if (reloc->sym->type == STT_SECTION) {
 			dest_sec = reloc->sym->sec;
 			dest_off = arch_dest_reloc_offset(reloc->addend);
-		} else if (arch_is_retpoline(reloc->sym)) {
+		} else if (reloc->sym->retpoline_thunk) {
 			/*
 			 * Retpoline jumps are really dynamic jumps in
 			 * disguise, so convert them accordingly.
@@ -1218,7 +1217,7 @@ static int add_call_destinations(struct
 
 			add_call_dest(file, insn, dest, false);
 
-		} else if (arch_is_retpoline(reloc->sym)) {
+		} else if (reloc->sym->retpoline_thunk) {
 			/*
 			 * Retpoline calls are really dynamic calls in
 			 * disguise, so convert them accordingly.
@@ -1919,17 +1918,28 @@ static int read_intra_function_calls(str
 	return 0;
 }
 
-static int read_static_call_tramps(struct objtool_file *file)
+static int classify_symbols(struct objtool_file *file)
 {
 	struct section *sec;
 	struct symbol *func;
 
 	for_each_sec(file, sec) {
 		list_for_each_entry(func, &sec->symbol_list, list) {
-			if (func->bind == STB_GLOBAL &&
-			    !strncmp(func->name, STATIC_CALL_TRAMP_PREFIX_STR,
+			if (func->bind != STB_GLOBAL)
+				continue;
+
+			if (!strncmp(func->name, STATIC_CALL_TRAMP_PREFIX_STR,
 				     strlen(STATIC_CALL_TRAMP_PREFIX_STR)))
 				func->static_call_tramp = true;
+
+			if (arch_is_retpoline(func))
+				func->retpoline_thunk = true;
+
+			if (!strcmp(func->name, "__fentry__"))
+				func->fentry = true;
+
+			if (!strncmp(func->name, "__sanitizer_cov_", 16))
+				func->kcov = true;
 		}
 	}
 
@@ -1995,7 +2005,7 @@ static int decode_sections(struct objtoo
 	/*
 	 * Must be before add_{jump_call}_destination.
 	 */
-	ret = read_static_call_tramps(file);
+	ret = classify_symbols(file);
 	if (ret)
 		return ret;
 
@@ -2053,9 +2063,9 @@ static int decode_sections(struct objtoo
 
 static bool is_fentry_call(struct instruction *insn)
 {
-	if (insn->type == INSN_CALL && insn->call_dest &&
-	    insn->call_dest->type == STT_NOTYPE &&
-	    !strcmp(insn->call_dest->name, "__fentry__"))
+	if (insn->type == INSN_CALL &&
+	    insn->call_dest &&
+	    insn->call_dest->fentry)
 		return true;
 
 	return false;
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -55,7 +55,10 @@ struct symbol {
 	unsigned int len;
 	struct symbol *pfunc, *cfunc, *alias;
 	bool uaccess_safe;
-	bool static_call_tramp;
+	u8 static_call_tramp : 1;
+	u8 retpoline_thunk   : 1;
+	u8 fentry            : 1;
+	u8 kcov              : 1;
 	struct list_head pv_target;
 };
 



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 02/16] objtool: Explicitly avoid self modifying code in .altinstr_replacement
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
  2021-10-26 12:01 ` [PATCH v3 01/16] objtool: Classify symbols Peter Zijlstra
@ 2021-10-26 12:01 ` Peter Zijlstra
  2021-10-28  4:01   ` Josh Poimboeuf
  2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
  2021-10-26 12:01 ` [PATCH v3 03/16] objtool: Shrink struct instruction Peter Zijlstra
                   ` (17 subsequent siblings)
  19 siblings, 2 replies; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-26 12:01 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers, bpf

Assume ALTERNATIVE()s know what they're doing and do not change, or
cause to change, instructions in .altinstr_replacement sections.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/check.c |   42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -993,18 +993,27 @@ static void remove_insn_ops(struct instr
 	}
 }
 
-static void add_call_dest(struct objtool_file *file, struct instruction *insn,
-			  struct symbol *dest, bool sibling)
+static void annotate_call_site(struct objtool_file *file,
+			       struct instruction *insn, bool sibling)
 {
 	struct reloc *reloc = insn_reloc(file, insn);
+	struct symbol *sym = insn->call_dest;
 
-	insn->call_dest = dest;
-	if (!dest)
+	if (!sym)
+		sym = reloc->sym;
+
+	/*
+	 * Alternative replacement code is just template code which is
+	 * sometimes copied to the original instruction, For now, don't
+	 * annotate it. (In the future we might consider annotating the
+	 * original instruction if/when it ever makes sense to do so.).
+	 */
+	if (!strcmp(insn->sec->name, ".altinstr_replacement"))
 		return;
 
-	if (insn->call_dest->static_call_tramp) {
-		list_add_tail(&insn->call_node,
-			      &file->static_call_list);
+	if (sym->static_call_tramp) {
+		list_add_tail(&insn->call_node, &file->static_call_list);
+		return;
 	}
 
 	/*
@@ -1012,7 +1021,7 @@ static void add_call_dest(struct objtool
 	 * so they need a little help, NOP out any KCOV calls from noinstr
 	 * text.
 	 */
-	if (insn->sec->noinstr && insn->call_dest->kcov) {
+	if (insn->sec->noinstr && sym->kcov) {
 		if (reloc) {
 			reloc->type = R_NONE;
 			elf_write_reloc(file->elf, reloc);
@@ -1024,9 +1033,10 @@ static void add_call_dest(struct objtool
 			               : arch_nop_insn(insn->len));
 
 		insn->type = sibling ? INSN_RETURN : INSN_NOP;
+		return;
 	}
 
-	if (mcount && insn->call_dest->fentry) {
+	if (mcount && sym->fentry) {
 		if (sibling)
 			WARN_FUNC("Tail call to __fentry__ !?!?", insn->sec, insn->offset);
 
@@ -1041,9 +1051,17 @@ static void add_call_dest(struct objtool
 
 		insn->type = INSN_NOP;
 
-		list_add_tail(&insn->mcount_loc_node,
-			      &file->mcount_loc_list);
+		list_add_tail(&insn->mcount_loc_node, &file->mcount_loc_list);
+		return;
 	}
+}
+
+static void add_call_dest(struct objtool_file *file, struct instruction *insn,
+			  struct symbol *dest, bool sibling)
+{
+	insn->call_dest = dest;
+	if (!dest)
+		return;
 
 	/*
 	 * Whatever stack impact regular CALLs have, should be undone
@@ -1053,6 +1071,8 @@ static void add_call_dest(struct objtool
 	 * are converted to JUMP, see read_intra_function_calls().
 	 */
 	remove_insn_ops(insn);
+
+	annotate_call_site(file, insn, sibling);
 }
 
 /*



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 03/16] objtool: Shrink struct instruction
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
  2021-10-26 12:01 ` [PATCH v3 01/16] objtool: Classify symbols Peter Zijlstra
  2021-10-26 12:01 ` [PATCH v3 02/16] objtool: Explicitly avoid self modifying code in .altinstr_replacement Peter Zijlstra
@ 2021-10-26 12:01 ` Peter Zijlstra
  2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
  2021-10-26 12:01 ` [PATCH v3 04/16] objtool,x86: Replace alternatives with .retpoline_sites Peter Zijlstra
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-26 12:01 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers, bpf

Any one instruction can only ever call a single function, therefore
insn->mcount_loc_node is superfluous and can use insn->call_node.

This shrinks struct instruction, which is by far the most numerous
structure objtool creates.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/check.c                 |    6 +++---
 tools/objtool/include/objtool/check.h |    1 -
 2 files changed, 3 insertions(+), 4 deletions(-)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -701,7 +701,7 @@ static int create_mcount_loc_sections(st
 		return 0;
 
 	idx = 0;
-	list_for_each_entry(insn, &file->mcount_loc_list, mcount_loc_node)
+	list_for_each_entry(insn, &file->mcount_loc_list, call_node)
 		idx++;
 
 	sec = elf_create_section(file->elf, "__mcount_loc", 0, sizeof(unsigned long), idx);
@@ -709,7 +709,7 @@ static int create_mcount_loc_sections(st
 		return -1;
 
 	idx = 0;
-	list_for_each_entry(insn, &file->mcount_loc_list, mcount_loc_node) {
+	list_for_each_entry(insn, &file->mcount_loc_list, call_node) {
 
 		loc = (unsigned long *)sec->data->d_buf + idx;
 		memset(loc, 0, sizeof(unsigned long));
@@ -1049,7 +1049,7 @@ static void annotate_call_site(struct ob
 
 		insn->type = INSN_NOP;
 
-		list_add_tail(&insn->mcount_loc_node, &file->mcount_loc_list);
+		list_add_tail(&insn->call_node, &file->mcount_loc_list);
 		return;
 	}
 }
--- a/tools/objtool/include/objtool/check.h
+++ b/tools/objtool/include/objtool/check.h
@@ -40,7 +40,6 @@ struct instruction {
 	struct list_head list;
 	struct hlist_node hash;
 	struct list_head call_node;
-	struct list_head mcount_loc_node;
 	struct section *sec;
 	unsigned long offset;
 	unsigned int len;



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 04/16] objtool,x86: Replace alternatives with .retpoline_sites
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (2 preceding siblings ...)
  2021-10-26 12:01 ` [PATCH v3 03/16] objtool: Shrink struct instruction Peter Zijlstra
@ 2021-10-26 12:01 ` Peter Zijlstra
  2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
  2021-10-26 12:01 ` [PATCH v3 05/16] x86/retpoline: Remove unused replacement symbols Peter Zijlstra
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-26 12:01 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers, bpf

Instead of writing complete alternatives, simply provide a list of all
the retpoline thunk calls. Then the kernel is free to do with them as
it pleases. Simpler code all-round.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/vmlinux.lds.S       |   14 +++
 tools/objtool/arch/x86/decode.c     |  120 --------------------------------
 tools/objtool/check.c               |  132 +++++++++++++++++++++++++-----------
 tools/objtool/elf.c                 |   84 ----------------------
 tools/objtool/include/objtool/elf.h |    1 
 tools/objtool/special.c             |    8 --
 6 files changed, 107 insertions(+), 252 deletions(-)

--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -272,6 +272,20 @@ SECTIONS
 		__parainstructions_end = .;
 	}
 
+#ifdef CONFIG_RETPOLINE
+	/*
+	 * List of instructions that call/jmp/jcc to retpoline thunks
+	 * __x86_indirect_thunk_*(). These instructions can be patched along
+	 * with alternatives, after which the section can be freed.
+	 */
+	. = ALIGN(8);
+	.retpoline_sites : AT(ADDR(.retpoline_sites) - LOAD_OFFSET) {
+		__retpoline_sites = .;
+		*(.retpoline_sites)
+		__retpoline_sites_end = .;
+	}
+#endif
+
 	/*
 	 * struct alt_inst entries. From the header (alternative.h):
 	 * "Alternative instructions for different CPU types or capabilities"
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -711,126 +711,6 @@ const char *arch_ret_insn(int len)
 	return ret[len-1];
 }
 
-/* asm/alternative.h ? */
-
-#define ALTINSTR_FLAG_INV	(1 << 15)
-#define ALT_NOT(feat)		((feat) | ALTINSTR_FLAG_INV)
-
-struct alt_instr {
-	s32 instr_offset;	/* original instruction */
-	s32 repl_offset;	/* offset to replacement instruction */
-	u16 cpuid;		/* cpuid bit set for replacement */
-	u8  instrlen;		/* length of original instruction */
-	u8  replacementlen;	/* length of new instruction */
-} __packed;
-
-static int elf_add_alternative(struct elf *elf,
-			       struct instruction *orig, struct symbol *sym,
-			       int cpuid, u8 orig_len, u8 repl_len)
-{
-	const int size = sizeof(struct alt_instr);
-	struct alt_instr *alt;
-	struct section *sec;
-	Elf_Scn *s;
-
-	sec = find_section_by_name(elf, ".altinstructions");
-	if (!sec) {
-		sec = elf_create_section(elf, ".altinstructions",
-					 SHF_ALLOC, 0, 0);
-
-		if (!sec) {
-			WARN_ELF("elf_create_section");
-			return -1;
-		}
-	}
-
-	s = elf_getscn(elf->elf, sec->idx);
-	if (!s) {
-		WARN_ELF("elf_getscn");
-		return -1;
-	}
-
-	sec->data = elf_newdata(s);
-	if (!sec->data) {
-		WARN_ELF("elf_newdata");
-		return -1;
-	}
-
-	sec->data->d_size = size;
-	sec->data->d_align = 1;
-
-	alt = sec->data->d_buf = malloc(size);
-	if (!sec->data->d_buf) {
-		perror("malloc");
-		return -1;
-	}
-	memset(sec->data->d_buf, 0, size);
-
-	if (elf_add_reloc_to_insn(elf, sec, sec->sh.sh_size,
-				  R_X86_64_PC32, orig->sec, orig->offset)) {
-		WARN("elf_create_reloc: alt_instr::instr_offset");
-		return -1;
-	}
-
-	if (elf_add_reloc(elf, sec, sec->sh.sh_size + 4,
-			  R_X86_64_PC32, sym, 0)) {
-		WARN("elf_create_reloc: alt_instr::repl_offset");
-		return -1;
-	}
-
-	alt->cpuid = bswap_if_needed(cpuid);
-	alt->instrlen = orig_len;
-	alt->replacementlen = repl_len;
-
-	sec->sh.sh_size += size;
-	sec->changed = true;
-
-	return 0;
-}
-
-#define X86_FEATURE_RETPOLINE                ( 7*32+12)
-
-int arch_rewrite_retpolines(struct objtool_file *file)
-{
-	struct instruction *insn;
-	struct reloc *reloc;
-	struct symbol *sym;
-	char name[32] = "";
-
-	list_for_each_entry(insn, &file->retpoline_call_list, call_node) {
-
-		if (insn->type != INSN_JUMP_DYNAMIC &&
-		    insn->type != INSN_CALL_DYNAMIC)
-			continue;
-
-		if (!strcmp(insn->sec->name, ".text.__x86.indirect_thunk"))
-			continue;
-
-		reloc = insn->reloc;
-
-		sprintf(name, "__x86_indirect_alt_%s_%s",
-			insn->type == INSN_JUMP_DYNAMIC ? "jmp" : "call",
-			reloc->sym->name + 21);
-
-		sym = find_symbol_by_name(file->elf, name);
-		if (!sym) {
-			sym = elf_create_undef_symbol(file->elf, name);
-			if (!sym) {
-				WARN("elf_create_undef_symbol");
-				return -1;
-			}
-		}
-
-		if (elf_add_alternative(file->elf, insn, sym,
-					ALT_NOT(X86_FEATURE_RETPOLINE), 5, 5)) {
-			WARN("elf_add_alternative");
-			return -1;
-		}
-	}
-
-	return 0;
-}
-
 int arch_decode_hint_reg(u8 sp_reg, int *base)
 {
 	switch (sp_reg) {
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -683,6 +683,52 @@ static int create_static_call_sections(s
 	return 0;
 }
 
+static int create_retpoline_sites_sections(struct objtool_file *file)
+{
+	struct instruction *insn;
+	struct section *sec;
+	int idx;
+
+	sec = find_section_by_name(file->elf, ".retpoline_sites");
+	if (sec) {
+		WARN("file already has .retpoline_sites, skipping");
+		return 0;
+	}
+
+	idx = 0;
+	list_for_each_entry(insn, &file->retpoline_call_list, call_node)
+		idx++;
+
+	if (!idx)
+		return 0;
+
+	sec = elf_create_section(file->elf, ".retpoline_sites", 0,
+				 sizeof(int), idx);
+	if (!sec) {
+		WARN("elf_create_section: .retpoline_sites");
+		return -1;
+	}
+
+	idx = 0;
+	list_for_each_entry(insn, &file->retpoline_call_list, call_node) {
+
+		int *site = (int *)sec->data->d_buf + idx;
+		*site = 0;
+
+		if (elf_add_reloc_to_insn(file->elf, sec,
+					  idx * sizeof(int),
+					  R_X86_64_PC32,
+					  insn->sec, insn->offset)) {
+			WARN("elf_add_reloc_to_insn: .retpoline_sites");
+			return -1;
+		}
+
+		idx++;
+	}
+
+	return 0;
+}
+
 static int create_mcount_loc_sections(struct objtool_file *file)
 {
 	struct section *sec;
@@ -1013,6 +1059,11 @@ static void annotate_call_site(struct ob
 		return;
 	}
 
+	if (sym->retpoline_thunk) {
+		list_add_tail(&insn->call_node, &file->retpoline_call_list);
+		return;
+	}
+
 	/*
 	 * Many compilers cannot disable KCOV with a function attribute
 	 * so they need a little help, NOP out any KCOV calls from noinstr
@@ -1073,6 +1124,39 @@ static void add_call_dest(struct objtool
 	annotate_call_site(file, insn, sibling);
 }
 
+static void add_retpoline_call(struct objtool_file *file, struct instruction *insn)
+{
+	/*
+	 * Retpoline calls/jumps are really dynamic calls/jumps in disguise,
+	 * so convert them accordingly.
+	 */
+	switch (insn->type) {
+	case INSN_CALL:
+		insn->type = INSN_CALL_DYNAMIC;
+		break;
+	case INSN_JUMP_UNCONDITIONAL:
+		insn->type = INSN_JUMP_DYNAMIC;
+		break;
+	case INSN_JUMP_CONDITIONAL:
+		insn->type = INSN_JUMP_DYNAMIC_CONDITIONAL;
+		break;
+	default:
+		return;
+	}
+
+	insn->retpoline_safe = true;
+
+	/*
+	 * Whatever stack impact regular CALLs have, should be undone
+	 * by the RETURN of the called function.
+	 *
+	 * Annotated intra-function calls retain the stack_ops but
+	 * are converted to JUMP, see read_intra_function_calls().
+	 */
+	remove_insn_ops(insn);
+
+	annotate_call_site(file, insn, false);
+}
 /*
  * Find the destination instructions for all jumps.
  */
@@ -1095,19 +1179,7 @@ static int add_jump_destinations(struct
 			dest_sec = reloc->sym->sec;
 			dest_off = arch_dest_reloc_offset(reloc->addend);
 		} else if (reloc->sym->retpoline_thunk) {
-			/*
-			 * Retpoline jumps are really dynamic jumps in
-			 * disguise, so convert them accordingly.
-			 */
-			if (insn->type == INSN_JUMP_UNCONDITIONAL)
-				insn->type = INSN_JUMP_DYNAMIC;
-			else
-				insn->type = INSN_JUMP_DYNAMIC_CONDITIONAL;
-
-			list_add_tail(&insn->call_node,
-				      &file->retpoline_call_list);
-
-			insn->retpoline_safe = true;
+			add_retpoline_call(file, insn);
 			continue;
 		} else if (insn->func) {
 			/* internal or external sibling call (with reloc) */
@@ -1236,18 +1308,7 @@ static int add_call_destinations(struct
 			add_call_dest(file, insn, dest, false);
 
 		} else if (reloc->sym->retpoline_thunk) {
-			/*
-			 * Retpoline calls are really dynamic calls in
-			 * disguise, so convert them accordingly.
-			 */
-			insn->type = INSN_CALL_DYNAMIC;
-			insn->retpoline_safe = true;
-
-			list_add_tail(&insn->call_node,
-				      &file->retpoline_call_list);
-
-			remove_insn_ops(insn);
-			continue;
+			add_retpoline_call(file, insn);
 
 		} else
 			add_call_dest(file, insn, reloc->sym, false);
@@ -1972,11 +2033,6 @@ static void mark_rodata(struct objtool_f
 	file->rodata = found;
 }
 
-__weak int arch_rewrite_retpolines(struct objtool_file *file)
-{
-	return 0;
-}
-
 static int decode_sections(struct objtool_file *file)
 {
 	int ret;
@@ -2049,15 +2105,6 @@ static int decode_sections(struct objtoo
 	if (ret)
 		return ret;
 
-	/*
-	 * Must be after add_special_section_alts(), since this will emit
-	 * alternatives. Must be after add_{jump,call}_destination(), since
-	 * those create the call insn lists.
-	 */
-	ret = arch_rewrite_retpolines(file);
-	if (ret)
-		return ret;
-
 	return 0;
 }
 
@@ -3460,6 +3507,13 @@ int check(struct objtool_file *file)
 		goto out;
 	warnings += ret;
 
+	if (retpoline) {
+		ret = create_retpoline_sites_sections(file);
+		if (ret < 0)
+			goto out;
+		warnings += ret;
+	}
+
 	if (mcount) {
 		ret = create_mcount_loc_sections(file);
 		if (ret < 0)
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -741,90 +741,6 @@ static int elf_add_string(struct elf *el
 	return len;
 }
 
-struct symbol *elf_create_undef_symbol(struct elf *elf, const char *name)
-{
-	struct section *symtab, *symtab_shndx;
-	struct symbol *sym;
-	Elf_Data *data;
-	Elf_Scn *s;
-
-	sym = malloc(sizeof(*sym));
-	if (!sym) {
-		perror("malloc");
-		return NULL;
-	}
-	memset(sym, 0, sizeof(*sym));
-
-	sym->name = strdup(name);
-
-	sym->sym.st_name = elf_add_string(elf, NULL, sym->name);
-	if (sym->sym.st_name == -1)
-		return NULL;
-
-	sym->sym.st_info = GELF_ST_INFO(STB_GLOBAL, STT_NOTYPE);
-	// st_other 0
-	// st_shndx 0
-	// st_value 0
-	// st_size 0
-
-	symtab = find_section_by_name(elf, ".symtab");
-	if (!symtab) {
-		WARN("can't find .symtab");
-		return NULL;
-	}
-
-	s = elf_getscn(elf->elf, symtab->idx);
-	if (!s) {
-		WARN_ELF("elf_getscn");
-		return NULL;
-	}
-
-	data = elf_newdata(s);
-	if (!data) {
-		WARN_ELF("elf_newdata");
-		return NULL;
-	}
-
-	data->d_buf = &sym->sym;
-	data->d_size = sizeof(sym->sym);
-	data->d_align = 1;
-	data->d_type = ELF_T_SYM;
-
-	sym->idx = symtab->sh.sh_size / sizeof(sym->sym);
-
-	symtab->sh.sh_size += data->d_size;
-	symtab->changed = true;
-
-	symtab_shndx = find_section_by_name(elf, ".symtab_shndx");
-	if (symtab_shndx) {
-		s = elf_getscn(elf->elf, symtab_shndx->idx);
-		if (!s) {
-			WARN_ELF("elf_getscn");
-			return NULL;
-		}
-
-		data = elf_newdata(s);
-		if (!data) {
-			WARN_ELF("elf_newdata");
-			return NULL;
-		}
-
-		data->d_buf = &sym->sym.st_size; /* conveniently 0 */
-		data->d_size = sizeof(Elf32_Word);
-		data->d_align = 4;
-		data->d_type = ELF_T_WORD;
-
-		symtab_shndx->sh.sh_size += 4;
-		symtab_shndx->changed = true;
-	}
-
-	sym->sec = find_section_by_index(elf, 0);
-
-	elf_add_symbol(elf, sym);
-
-	return sym;
-}
-
 struct section *elf_create_section(struct elf *elf, const char *name,
 				   unsigned int sh_flags, size_t entsize, int nr)
 {
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -142,7 +142,6 @@ int elf_write_insn(struct elf *elf, stru
 		   unsigned long offset, unsigned int len,
 		   const char *insn);
 int elf_write_reloc(struct elf *elf, struct reloc *reloc);
-struct symbol *elf_create_undef_symbol(struct elf *elf, const char *name);
 int elf_write(struct elf *elf);
 void elf_close(struct elf *elf);
 
--- a/tools/objtool/special.c
+++ b/tools/objtool/special.c
@@ -109,14 +109,6 @@ static int get_alt_entry(struct elf *elf
 			return -1;
 		}
 
-		/*
-		 * Skip retpoline .altinstr_replacement... we already rewrite the
-		 * instructions for retpolines anyway, see arch_is_retpoline()
-		 * usage in add_{call,jump}_destinations().
-		 */
-		if (arch_is_retpoline(new_reloc->sym))
-			return 1;
-
 		reloc_to_sec_off(new_reloc, &alt->new_sec, &alt->new_off);
 
 		/* _ASM_EXTABLE_EX hack */



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 05/16] x86/retpoline: Remove unused replacement symbols
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (3 preceding siblings ...)
  2021-10-26 12:01 ` [PATCH v3 04/16] objtool,x86: Replace alternatives with .retpoline_sites Peter Zijlstra
@ 2021-10-26 12:01 ` Peter Zijlstra
  2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
  2021-10-26 12:01 ` [PATCH v3 06/16] x86/asm: Fix register order Peter Zijlstra
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-26 12:01 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers, bpf

Now that objtool no longer creates alternatives, these replacement
symbols are no longer needed, remove them.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/asm-prototypes.h |   10 --------
 arch/x86/lib/retpoline.S              |   42 ----------------------------------
 2 files changed, 52 deletions(-)

--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -24,14 +24,4 @@ extern void cmpxchg8b_emu(void);
 	extern asmlinkage void __x86_indirect_thunk_ ## reg (void);
 #include <asm/GEN-for-each-reg.h>
 
-#undef GEN
-#define GEN(reg) \
-	extern asmlinkage void __x86_indirect_alt_call_ ## reg (void);
-#include <asm/GEN-for-each-reg.h>
-
-#undef GEN
-#define GEN(reg) \
-	extern asmlinkage void __x86_indirect_alt_jmp_ ## reg (void);
-#include <asm/GEN-for-each-reg.h>
-
 #endif /* CONFIG_RETPOLINE */
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -41,36 +41,6 @@ SYM_FUNC_END(__x86_indirect_thunk_\reg)
 .endm
 
 /*
- * This generates .altinstr_replacement symbols for use by objtool. They,
- * however, must not actually live in .altinstr_replacement since that will be
- * discarded after init, but module alternatives will also reference these
- * symbols.
- *
- * Their names matches the "__x86_indirect_" prefix to mark them as retpolines.
- */
-.macro ALT_THUNK reg
-
-	.align 1
-
-SYM_FUNC_START_NOALIGN(__x86_indirect_alt_call_\reg)
-	ANNOTATE_RETPOLINE_SAFE
-1:	call	*%\reg
-2:	.skip	5-(2b-1b), 0x90
-SYM_FUNC_END(__x86_indirect_alt_call_\reg)
-
-STACK_FRAME_NON_STANDARD(__x86_indirect_alt_call_\reg)
-
-SYM_FUNC_START_NOALIGN(__x86_indirect_alt_jmp_\reg)
-	ANNOTATE_RETPOLINE_SAFE
-1:	jmp	*%\reg
-2:	.skip	5-(2b-1b), 0x90
-SYM_FUNC_END(__x86_indirect_alt_jmp_\reg)
-
-STACK_FRAME_NON_STANDARD(__x86_indirect_alt_jmp_\reg)
-
-.endm
-
-/*
  * Despite being an assembler file we can't just use .irp here
  * because __KSYM_DEPS__ only uses the C preprocessor and would
  * only see one instance of "__x86_indirect_thunk_\reg" rather
@@ -92,15 +62,3 @@ STACK_FRAME_NON_STANDARD(__x86_indirect_
 #undef GEN
 #define GEN(reg) EXPORT_THUNK(reg)
 #include <asm/GEN-for-each-reg.h>
-
-#undef GEN
-#define GEN(reg) ALT_THUNK reg
-#include <asm/GEN-for-each-reg.h>
-
-#undef GEN
-#define GEN(reg) __EXPORT_THUNK(__x86_indirect_alt_call_ ## reg)
-#include <asm/GEN-for-each-reg.h>
-
-#undef GEN
-#define GEN(reg) __EXPORT_THUNK(__x86_indirect_alt_jmp_ ## reg)
-#include <asm/GEN-for-each-reg.h>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 06/16] x86/asm: Fix register order
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (4 preceding siblings ...)
  2021-10-26 12:01 ` [PATCH v3 05/16] x86/retpoline: Remove unused replacement symbols Peter Zijlstra
@ 2021-10-26 12:01 ` Peter Zijlstra
  2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
  2021-10-26 12:01 ` [PATCH v3 07/16] x86/asm: Fixup odd GEN-for-each-reg.h usage Peter Zijlstra
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-26 12:01 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers, bpf

Ensure the register order is correct; this allows for easy translation
between register number and trampoline and vice-versa.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/GEN-for-each-reg.h |   13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

--- a/arch/x86/include/asm/GEN-for-each-reg.h
+++ b/arch/x86/include/asm/GEN-for-each-reg.h
@@ -1,11 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * These are in machine order; things rely on that.
+ */
 #ifdef CONFIG_64BIT
 GEN(rax)
-GEN(rbx)
 GEN(rcx)
 GEN(rdx)
+GEN(rbx)
+GEN(rsp)
+GEN(rbp)
 GEN(rsi)
 GEN(rdi)
-GEN(rbp)
 GEN(r8)
 GEN(r9)
 GEN(r10)
@@ -16,10 +20,11 @@ GEN(r14)
 GEN(r15)
 #else
 GEN(eax)
-GEN(ebx)
 GEN(ecx)
 GEN(edx)
+GEN(ebx)
+GEN(esp)
+GEN(ebp)
 GEN(esi)
 GEN(edi)
-GEN(ebp)
 #endif



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 07/16] x86/asm: Fixup odd GEN-for-each-reg.h usage
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (5 preceding siblings ...)
  2021-10-26 12:01 ` [PATCH v3 06/16] x86/asm: Fix register order Peter Zijlstra
@ 2021-10-26 12:01 ` Peter Zijlstra
  2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
  2021-10-26 12:01 ` [PATCH v3 08/16] x86/retpoline: Move the retpoline thunk declarations to nospec-branch.h Peter Zijlstra
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-26 12:01 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers, bpf

Currently GEN-for-each-reg.h usage leaves GEN defined, relying on any
subsequent usage to start with #undef, which is rude.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/asm-prototypes.h |    2 +-
 arch/x86/lib/retpoline.S              |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -19,9 +19,9 @@ extern void cmpxchg8b_emu(void);
 
 #ifdef CONFIG_RETPOLINE
 
-#undef GEN
 #define GEN(reg) \
 	extern asmlinkage void __x86_indirect_thunk_ ## reg (void);
 #include <asm/GEN-for-each-reg.h>
+#undef GEN
 
 #endif /* CONFIG_RETPOLINE */
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -55,10 +55,10 @@ SYM_FUNC_END(__x86_indirect_thunk_\reg)
 #define __EXPORT_THUNK(sym)	_ASM_NOKPROBE(sym); EXPORT_SYMBOL(sym)
 #define EXPORT_THUNK(reg)	__EXPORT_THUNK(__x86_indirect_thunk_ ## reg)
 
-#undef GEN
 #define GEN(reg) THUNK reg
 #include <asm/GEN-for-each-reg.h>
-
 #undef GEN
+
 #define GEN(reg) EXPORT_THUNK(reg)
 #include <asm/GEN-for-each-reg.h>
+#undef GEN



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 08/16] x86/retpoline: Move the retpoline thunk declarations to nospec-branch.h
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (6 preceding siblings ...)
  2021-10-26 12:01 ` [PATCH v3 07/16] x86/asm: Fixup odd GEN-for-each-reg.h usage Peter Zijlstra
@ 2021-10-26 12:01 ` Peter Zijlstra
  2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
  2021-10-26 12:01 ` [PATCH v3 09/16] x86/retpoline: Create a retpoline thunk array Peter Zijlstra
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-26 12:01 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers, bpf

Because it makes no sense to split the retpoline gunk over multiple
headers.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/asm-prototypes.h |    8 --------
 arch/x86/include/asm/nospec-branch.h  |    7 +++++++
 arch/x86/net/bpf_jit_comp.c           |    1 -
 3 files changed, 7 insertions(+), 9 deletions(-)

--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -17,11 +17,3 @@
 extern void cmpxchg8b_emu(void);
 #endif
 
-#ifdef CONFIG_RETPOLINE
-
-#define GEN(reg) \
-	extern asmlinkage void __x86_indirect_thunk_ ## reg (void);
-#include <asm/GEN-for-each-reg.h>
-#undef GEN
-
-#endif /* CONFIG_RETPOLINE */
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -5,6 +5,7 @@
 
 #include <linux/static_key.h>
 #include <linux/objtool.h>
+#include <linux/linkage.h>
 
 #include <asm/alternative.h>
 #include <asm/cpufeatures.h>
@@ -118,6 +119,12 @@
 	".popsection\n\t"
 
 #ifdef CONFIG_RETPOLINE
+
+#define GEN(reg) \
+	extern asmlinkage void __x86_indirect_thunk_ ## reg (void);
+#include <asm/GEN-for-each-reg.h>
+#undef GEN
+
 #ifdef CONFIG_X86_64
 
 /*
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -15,7 +15,6 @@
 #include <asm/set_memory.h>
 #include <asm/nospec-branch.h>
 #include <asm/text-patching.h>
-#include <asm/asm-prototypes.h>
 
 static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len)
 {



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 09/16] x86/retpoline: Create a retpoline thunk array
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (7 preceding siblings ...)
  2021-10-26 12:01 ` [PATCH v3 08/16] x86/retpoline: Move the retpoline thunk declarations to nospec-branch.h Peter Zijlstra
@ 2021-10-26 12:01 ` Peter Zijlstra
  2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
  2021-10-26 12:01 ` [PATCH v3 10/16] x86/alternative: Implement .retpoline_sites support Peter Zijlstra
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-26 12:01 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers, bpf

Stick all the retpolines in a single symbol and have the individual
thunks as inner labels, this should guarantee thunk order and layout.

Previously there were 16 (or rather 15 without rsp) separate symbols and
a toolchain might reasonably expect it could displace them however it
liked, with disregard for their relative position.

However, now they're part of a larger symbol. Any change to their
relative position would disrupt this larger _array symbol and thus not
be sound.

This is the same reasoning used for data symbols. On their own there
is no guarantee about their relative position wrt to one aonther, but
we're still able to do arrays because an array as a whole is a single
larger symbol.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/nospec-branch.h |    8 +++++++-
 arch/x86/lib/retpoline.S             |   14 +++++++++-----
 2 files changed, 16 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -12,6 +12,8 @@
 #include <asm/msr-index.h>
 #include <asm/unwind_hints.h>
 
+#define RETPOLINE_THUNK_SIZE	32
+
 /*
  * Fill the CPU return stack buffer.
  *
@@ -120,11 +122,15 @@
 
 #ifdef CONFIG_RETPOLINE
 
+typedef u8 retpoline_thunk_t[RETPOLINE_THUNK_SIZE];
+
 #define GEN(reg) \
-	extern asmlinkage void __x86_indirect_thunk_ ## reg (void);
+	extern retpoline_thunk_t __x86_indirect_thunk_ ## reg;
 #include <asm/GEN-for-each-reg.h>
 #undef GEN
 
+extern retpoline_thunk_t __x86_indirect_thunk_array[];
+
 #ifdef CONFIG_X86_64
 
 /*
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -28,16 +28,14 @@
 
 .macro THUNK reg
 
-	.align 32
-
-SYM_FUNC_START(__x86_indirect_thunk_\reg)
+	.align RETPOLINE_THUNK_SIZE
+SYM_INNER_LABEL(__x86_indirect_thunk_\reg, SYM_L_GLOBAL)
+	UNWIND_HINT_EMPTY
 
 	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), \
 		      __stringify(RETPOLINE \reg), X86_FEATURE_RETPOLINE, \
 		      __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), X86_FEATURE_RETPOLINE_AMD
 
-SYM_FUNC_END(__x86_indirect_thunk_\reg)
-
 .endm
 
 /*
@@ -55,10 +53,16 @@ SYM_FUNC_END(__x86_indirect_thunk_\reg)
 #define __EXPORT_THUNK(sym)	_ASM_NOKPROBE(sym); EXPORT_SYMBOL(sym)
 #define EXPORT_THUNK(reg)	__EXPORT_THUNK(__x86_indirect_thunk_ ## reg)
 
+	.align RETPOLINE_THUNK_SIZE
+SYM_CODE_START(__x86_indirect_thunk_array)
+
 #define GEN(reg) THUNK reg
 #include <asm/GEN-for-each-reg.h>
 #undef GEN
 
+	.align RETPOLINE_THUNK_SIZE
+SYM_CODE_END(__x86_indirect_thunk_array)
+
 #define GEN(reg) EXPORT_THUNK(reg)
 #include <asm/GEN-for-each-reg.h>
 #undef GEN



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 10/16] x86/alternative: Implement .retpoline_sites support
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (8 preceding siblings ...)
  2021-10-26 12:01 ` [PATCH v3 09/16] x86/retpoline: Create a retpoline thunk array Peter Zijlstra
@ 2021-10-26 12:01 ` Peter Zijlstra
  2021-10-27 17:38   ` Borislav Petkov
  2021-10-29  8:02   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
  2021-10-26 12:01 ` [PATCH v3 11/16] x86/alternative: Handle Jcc __x86_indirect_thunk_\reg Peter Zijlstra
                   ` (9 subsequent siblings)
  19 siblings, 2 replies; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-26 12:01 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers, bpf

Rewrite retpoline thunk call sites to be indirect calls for
spectre_v2=off. This ensures spectre_v2=off is as near to a
RETPOLINE=n build as possible.

This is the replacement for objtool writing alternative entries to
ensure the same and achieves feature-parity with the previous
approach.

One noteworthy feature is that it relies on the thunks to be in
machine order to compute the register index.

Specifically, this does not yet address the Jcc __x86_indirect_thunk_*
calls generated by clang, a future patch will add this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/um/kernel/um_arch.c           |    4 +
 arch/x86/include/asm/alternative.h |    1 
 arch/x86/kernel/alternative.c      |  141 +++++++++++++++++++++++++++++++++++--
 arch/x86/kernel/module.c           |    9 ++
 4 files changed, 150 insertions(+), 5 deletions(-)

--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -421,6 +421,10 @@ void __init check_bugs(void)
 	os_check_bugs();
 }
 
+void apply_retpolines(s32 *start, s32 *end)
+{
+}
+
 void apply_alternatives(struct alt_instr *start, struct alt_instr *end)
 {
 }
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -75,6 +75,7 @@ extern int alternatives_patched;
 
 extern void alternative_instructions(void);
 extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end);
+extern void apply_retpolines(s32 *start, s32 *end);
 
 struct module;
 
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -29,6 +29,7 @@
 #include <asm/io.h>
 #include <asm/fixmap.h>
 #include <asm/paravirt.h>
+#include <asm/asm-prototypes.h>
 
 int __read_mostly alternatives_patched;
 
@@ -113,6 +114,7 @@ static void __init_or_module add_nops(vo
 	}
 }
 
+extern s32 __retpoline_sites[], __retpoline_sites_end[];
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
 extern s32 __smp_locks[], __smp_locks_end[];
 void text_poke_early(void *addr, const void *opcode, size_t len);
@@ -221,7 +223,7 @@ static __always_inline int optimize_nops
  * "noinline" to cause control flow change and thus invalidate I$ and
  * cause refetch after modification.
  */
-static void __init_or_module noinline optimize_nops(struct alt_instr *a, u8 *instr)
+static void __init_or_module noinline optimize_nops(u8 *instr, size_t len)
 {
 	struct insn insn;
 	int i = 0;
@@ -239,11 +241,11 @@ static void __init_or_module noinline op
 		 * optimized.
 		 */
 		if (insn.length == 1 && insn.opcode.bytes[0] == 0x90)
-			i += optimize_nops_range(instr, a->instrlen, i);
+			i += optimize_nops_range(instr, len, i);
 		else
 			i += insn.length;
 
-		if (i >= a->instrlen)
+		if (i >= len)
 			return;
 	}
 }
@@ -331,10 +333,135 @@ void __init_or_module noinline apply_alt
 		text_poke_early(instr, insn_buff, insn_buff_sz);
 
 next:
-		optimize_nops(a, instr);
+		optimize_nops(instr, a->instrlen);
 	}
 }
 
+#if defined(CONFIG_RETPOLINE) && defined(CONFIG_STACK_VALIDATION)
+
+/*
+ * CALL/JMP *%\reg
+ */
+static int emit_indirect(int op, int reg, u8 *bytes)
+{
+	int i = 0;
+	u8 modrm;
+
+	switch (op) {
+	case CALL_INSN_OPCODE:
+		modrm = 0x10; /* Reg = 2; CALL r/m */
+		break;
+
+	case JMP32_INSN_OPCODE:
+		modrm = 0x20; /* Reg = 4; JMP r/m */
+		break;
+
+	default:
+		WARN_ON_ONCE(1);
+		return -1;
+	}
+
+	if (reg >= 8) {
+		bytes[i++] = 0x41; /* REX.B prefix */
+		reg -= 8;
+	}
+
+	modrm |= 0xc0; /* Mod = 3 */
+	modrm += reg;
+
+	bytes[i++] = 0xff; /* opcode */
+	bytes[i++] = modrm;
+
+	return i;
+}
+
+/*
+ * Rewrite the compiler generated retpoline thunk calls.
+ *
+ * For spectre_v2=off (!X86_FEATURE_RETPOLINE), rewrite them into immediate
+ * indirect instructions, avoiding the extra indirection.
+ *
+ * For example, convert:
+ *
+ *   CALL __x86_indirect_thunk_\reg
+ *
+ * into:
+ *
+ *   CALL *%\reg
+ *
+ */
+static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
+{
+	retpoline_thunk_t *target;
+	int reg, i = 0;
+
+	target = addr + insn->length + insn->immediate.value;
+	reg = target - __x86_indirect_thunk_array;
+
+	if (WARN_ON_ONCE(reg & ~0xf))
+		return -1;
+
+	/* If anyone ever does: CALL/JMP *%rsp, we're in deep trouble. */
+	BUG_ON(reg == 4);
+
+	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
+		return -1;
+
+	i = emit_indirect(insn->opcode.bytes[0], reg, bytes);
+	if (i < 0)
+		return i;
+
+	for (; i < insn->length;)
+		bytes[i++] = BYTES_NOP1;
+
+	return i;
+}
+
+/*
+ * Generated by 'objtool --retpoline'.
+ */
+void __init_or_module noinline apply_retpolines(s32 *start, s32 *end)
+{
+	s32 *s;
+
+	for (s = start; s < end; s++) {
+		void *addr = (void *)s + *s;
+		struct insn insn;
+		int len, ret;
+		u8 bytes[16];
+		u8 op1, op2;
+
+		ret = insn_decode_kernel(&insn, addr);
+		if (WARN_ON_ONCE(ret < 0))
+			continue;
+
+		op1 = insn.opcode.bytes[0];
+		op2 = insn.opcode.bytes[1];
+
+		switch (op1) {
+		case CALL_INSN_OPCODE:
+		case JMP32_INSN_OPCODE:
+			break;
+
+		default:
+			WARN_ON_ONCE(1);
+			continue;
+		}
+
+		len = patch_retpoline(addr, &insn, bytes);
+		if (len == insn.length) {
+			optimize_nops(bytes, len);
+			text_poke_early(addr, bytes, len);
+		}
+	}
+}
+
+#else /* !RETPOLINES || !CONFIG_STACK_VALIDATION */
+
+void __init_or_module noinline apply_retpolines(s32 *start, s32 *end) { }
+
+#endif /* CONFIG_RETPOLINE && CONFIG_STACK_VALIDATION */
+
 #ifdef CONFIG_SMP
 static void alternatives_smp_lock(const s32 *start, const s32 *end,
 				  u8 *text, u8 *text_end)
@@ -643,6 +770,12 @@ void __init alternative_instructions(voi
 	apply_paravirt(__parainstructions, __parainstructions_end);
 
 	/*
+	 * Rewrite the retpolines, must be done before alternatives since
+	 * those can rewrite the retpoline thunks.
+	 */
+	apply_retpolines(__retpoline_sites, __retpoline_sites_end);
+
+	/*
 	 * Then patch alternatives, such that those paravirt calls that are in
 	 * alternatives can be overwritten by their immediate fragments.
 	 */
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -251,7 +251,8 @@ int module_finalize(const Elf_Ehdr *hdr,
 		    struct module *me)
 {
 	const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
-		*para = NULL, *orc = NULL, *orc_ip = NULL;
+		*para = NULL, *orc = NULL, *orc_ip = NULL,
+		*retpolines = NULL;
 	char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
 
 	for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
@@ -267,8 +268,14 @@ int module_finalize(const Elf_Ehdr *hdr,
 			orc = s;
 		if (!strcmp(".orc_unwind_ip", secstrings + s->sh_name))
 			orc_ip = s;
+		if (!strcmp(".retpoline_sites", secstrings + s->sh_name))
+			retpolines = s;
 	}
 
+	if (retpolines) {
+		void *rseg = (void *)retpolines->sh_addr;
+		apply_retpolines(rseg, rseg + retpolines->sh_size);
+	}
 	if (alt) {
 		/* patch .altinstructions */
 		void *aseg = (void *)alt->sh_addr;



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 11/16] x86/alternative: Handle Jcc __x86_indirect_thunk_\reg
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (9 preceding siblings ...)
  2021-10-26 12:01 ` [PATCH v3 10/16] x86/alternative: Implement .retpoline_sites support Peter Zijlstra
@ 2021-10-26 12:01 ` Peter Zijlstra
  2021-10-28  9:29   ` Borislav Petkov
  2021-10-29  8:02   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
  2021-10-26 12:01 ` [PATCH v3 12/16] x86/alternative: Try inline spectre_v2=retpoline,amd Peter Zijlstra
                   ` (8 subsequent siblings)
  19 siblings, 2 replies; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-26 12:01 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers, bpf

Handle the rare cases where the compiler (clang) does an indirect
conditional tail-call using:

  Jcc __x86_indirect_thunk_\reg

For the !RETPOLINE case this can be rewritten to fit the original (6
byte) instruction like:

  Jncc.d8	1f
  JMP		*%\reg
  NOP
1:

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/alternative.c |   38 ++++++++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -393,7 +393,8 @@ static int emit_indirect(int op, int reg
 static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
 {
 	retpoline_thunk_t *target;
-	int reg, i = 0;
+	int reg, ret, i = 0;
+	u8 op, cc;
 
 	target = addr + insn->length + insn->immediate.value;
 	reg = target - __x86_indirect_thunk_array;
@@ -407,9 +408,34 @@ static int patch_retpoline(void *addr, s
 	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
 		return -1;
 
-	i = emit_indirect(insn->opcode.bytes[0], reg, bytes);
-	if (i < 0)
-		return i;
+	op = insn->opcode.bytes[0];
+
+	/*
+	 * Convert:
+	 *
+	 *   Jcc.d32 __x86_indirect_thunk_\reg
+	 *
+	 * into:
+	 *
+	 *   Jncc.d8 1f
+	 *   JMP *%\reg
+	 *   NOP
+	 * 1:
+	 */
+	if (op == 0x0f && (insn->opcode.bytes[1] & 0xf0) == 0x80) {
+		cc = insn->opcode.bytes[1] & 0xf;
+		cc ^= 1; /* invert condition */
+
+		bytes[i++] = 0x70 + cc; /* Jcc.d8 */
+		bytes[i++] = insn->length - 2;
+
+		op = JMP32_INSN_OPCODE;
+	}
+
+	ret = emit_indirect(op, reg, bytes + i);
+	if (ret < 0)
+		return ret;
+	i += ret;
 
 	for (; i < insn->length;)
 		bytes[i++] = BYTES_NOP1;
@@ -443,6 +469,10 @@ void __init_or_module noinline apply_ret
 		case JMP32_INSN_OPCODE:
 			break;
 
+		case 0x0f: /* escape */
+			if (op2 >= 0x80 && op2 <= 0x8f)
+				break;
+			fallthrough;
 		default:
 			WARN_ON_ONCE(1);
 			continue;



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 12/16] x86/alternative: Try inline spectre_v2=retpoline,amd
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (10 preceding siblings ...)
  2021-10-26 12:01 ` [PATCH v3 11/16] x86/alternative: Handle Jcc __x86_indirect_thunk_\reg Peter Zijlstra
@ 2021-10-26 12:01 ` Peter Zijlstra
  2021-10-29  8:02   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
  2021-10-26 12:01 ` [PATCH v3 13/16] x86/alternative: Add debug prints to apply_retpolines() Peter Zijlstra
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-26 12:01 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers, bpf

Try and replace retpoline thunk calls with:

  LFENCE
  CALL    *%\reg

for spectre_v2=retpoline,amd.

Specifically, the sequence above is 5 bytes for the low 8 registers,
but 6 bytes for the high 8 registers. This means that unless the
compilers prefix stuff the call with higher registers this replacement
will fail.

Luckily GCC strongly favours RAX for the indirect calls and most (95%+
for defconfig-x86_64) will be converted. OTOH clang strongly favours
R11 and almost nothing gets converted.

Note: it will also generate a correct replacement for the Jcc.d32
case, except unless the compilers start to prefix stuff that, it'll
never fit. Specifically:

  Jncc.d8 1f
  LFENCE
  JMP     *%\reg
1:

is 7-8 bytes long, where the original instruction in unpadded form is
only 6 bytes.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/alternative.c |   16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -389,6 +389,7 @@ static int emit_indirect(int op, int reg
  *
  *   CALL *%\reg
  *
+ * It also tries to inline spectre_v2=retpoline,amd when size permits.
  */
 static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
 {
@@ -405,7 +406,8 @@ static int patch_retpoline(void *addr, s
 	/* If anyone ever does: CALL/JMP *%rsp, we're in deep trouble. */
 	BUG_ON(reg == 4);
 
-	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
+	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE) &&
+	    !cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD))
 		return -1;
 
 	op = insn->opcode.bytes[0];
@@ -418,8 +420,9 @@ static int patch_retpoline(void *addr, s
 	 * into:
 	 *
 	 *   Jncc.d8 1f
+	 *   [ LFENCE ]
 	 *   JMP *%\reg
-	 *   NOP
+	 *   [ NOP ]
 	 * 1:
 	 */
 	if (op == 0x0f && (insn->opcode.bytes[1] & 0xf0) == 0x80) {
@@ -432,6 +435,15 @@ static int patch_retpoline(void *addr, s
 		op = JMP32_INSN_OPCODE;
 	}
 
+	/*
+	 * For RETPOLINE_AMD: prepend the indirect CALL/JMP with an LFENCE.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) {
+		bytes[i++] = 0x0f;
+		bytes[i++] = 0xae;
+		bytes[i++] = 0xe8; /* LFENCE */
+	}
+
 	ret = emit_indirect(op, reg, bytes + i);
 	if (ret < 0)
 		return ret;



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 13/16] x86/alternative: Add debug prints to apply_retpolines()
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (11 preceding siblings ...)
  2021-10-26 12:01 ` [PATCH v3 12/16] x86/alternative: Try inline spectre_v2=retpoline,amd Peter Zijlstra
@ 2021-10-26 12:01 ` Peter Zijlstra
  2021-10-29  8:02   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
  2021-10-26 12:01 ` [PATCH v3 14/16] x86,bugs: Unconditionally allow spectre_v2=retpoline,amd Peter Zijlstra
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-26 12:01 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers, bpf

Make sure we can see the text changes when booting with
'debug-alternative'.

Example output:

 [ ] SMP alternatives: retpoline at: __traceiter_initcall_level+0x1f/0x30 (ffffffff8100066f) len: 5 to: __x86_indirect_thunk_rax+0x0/0x20
 [ ] SMP alternatives: ffffffff82603e58: [2:5) optimized NOPs: ff d0 0f 1f 00
 [ ] SMP alternatives: ffffffff8100066f: orig: e8 cc 30 00 01
 [ ] SMP alternatives: ffffffff8100066f: repl: ff d0 0f 1f 00

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/alternative.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -492,9 +492,15 @@ void __init_or_module noinline apply_ret
 			continue;
 		}
 
+		DPRINTK("retpoline at: %pS (%px) len: %d to: %pS",
+			addr, addr, insn.length,
+			addr + insn.length + insn.immediate.value);
+
 		len = patch_retpoline(addr, &insn, bytes);
 		if (len == insn.length) {
 			optimize_nops(bytes, len);
+			DUMP_BYTES(((u8*)addr),  len, "%px: orig: ", addr);
+			DUMP_BYTES(((u8*)bytes), len, "%px: repl: ", addr);
 			text_poke_early(addr, bytes, len);
 		}
 	}



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 14/16] x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (12 preceding siblings ...)
  2021-10-26 12:01 ` [PATCH v3 13/16] x86/alternative: Add debug prints to apply_retpolines() Peter Zijlstra
@ 2021-10-26 12:01 ` Peter Zijlstra
  2021-10-29  8:02   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
  2021-10-26 12:01 ` [PATCH v3 15/16] bpf, x86: Simplify computing label offsets Peter Zijlstra
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-26 12:01 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers, bpf

Currently Linux prevents usage of retpoline,amd on !AMD hardware, this
is unfriendly and gets in the way of testing. Remove this restriction.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/cpu/bugs.c |    7 -------
 1 file changed, 7 deletions(-)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -882,13 +882,6 @@ static enum spectre_v2_mitigation_cmd __
 		return SPECTRE_V2_CMD_AUTO;
 	}
 
-	if (cmd == SPECTRE_V2_CMD_RETPOLINE_AMD &&
-	    boot_cpu_data.x86_vendor != X86_VENDOR_HYGON &&
-	    boot_cpu_data.x86_vendor != X86_VENDOR_AMD) {
-		pr_err("retpoline,amd selected but CPU is not AMD. Switching to AUTO select\n");
-		return SPECTRE_V2_CMD_AUTO;
-	}
-
 	spec_v2_print_cond(mitigation_options[i].option,
 			   mitigation_options[i].secure);
 	return cmd;



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 15/16] bpf, x86: Simplify computing label offsets
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (13 preceding siblings ...)
  2021-10-26 12:01 ` [PATCH v3 14/16] x86,bugs: Unconditionally allow spectre_v2=retpoline,amd Peter Zijlstra
@ 2021-10-26 12:01 ` Peter Zijlstra
  2021-10-29  8:02   ` [tip: objtool/core] bpf,x86: " tip-bot2 for Peter Zijlstra
  2021-10-26 12:01 ` [PATCH v3 16/16] bpf,x86: Respect X86_FEATURE_RETPOLINE* Peter Zijlstra
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-26 12:01 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers, bpf

Take an idea from the 32bit JIT, which uses the multi-pass nature of
the JIT to compute the instruction offsets on a prior pass in order to
compute the relative jump offsets on a later pass.

Application to the x86_64 JIT is slightly more involved because the
offsets depend on program variables (such as callee_regs_used and
stack_depth) and hence the computed offsets need to be kept in the
context of the JIT.

This removes, IMO quite fragile, code that hard-codes the offsets and
tries to compute the length of variable parts of it.

Convert both emit_bpf_tail_call_*() functions which have an out: label
at the end. Additionally emit_bpt_tail_call_direct() also has a poke
table entry, for which it computes the offset from the end (and thus
already relies on the previous pass to have computed addrs[i]), also
convert this to be a forward based offset.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/net/bpf_jit_comp.c |  123 +++++++++++++++-----------------------------
 1 file changed, 42 insertions(+), 81 deletions(-)

--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -224,6 +224,14 @@ static void jit_fill_hole(void *area, un
 
 struct jit_context {
 	int cleanup_addr; /* Epilogue code offset */
+
+	/*
+	 * Program specific offsets of labels in the code; these rely on the
+	 * JIT doing at least 2 passes, recording the position on the first
+	 * pass, only to generate the correct offset on the second pass.
+	 */
+	int tail_call_direct_label;
+	int tail_call_indirect_label;
 };
 
 /* Maximum number of bytes emitted while JITing one eBPF insn */
@@ -379,22 +387,6 @@ int bpf_arch_text_poke(void *ip, enum bp
 	return __bpf_arch_text_poke(ip, t, old_addr, new_addr, true);
 }
 
-static int get_pop_bytes(bool *callee_regs_used)
-{
-	int bytes = 0;
-
-	if (callee_regs_used[3])
-		bytes += 2;
-	if (callee_regs_used[2])
-		bytes += 2;
-	if (callee_regs_used[1])
-		bytes += 2;
-	if (callee_regs_used[0])
-		bytes += 1;
-
-	return bytes;
-}
-
 /*
  * Generate the following code:
  *
@@ -410,29 +402,12 @@ static int get_pop_bytes(bool *callee_re
  * out:
  */
 static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
-					u32 stack_depth)
+					u32 stack_depth, u8 *ip,
+					struct jit_context *ctx)
 {
 	int tcc_off = -4 - round_up(stack_depth, 8);
-	u8 *prog = *pprog;
-	int pop_bytes = 0;
-	int off1 = 42;
-	int off2 = 31;
-	int off3 = 9;
-
-	/* count the additional bytes used for popping callee regs from stack
-	 * that need to be taken into account for each of the offsets that
-	 * are used for bailing out of the tail call
-	 */
-	pop_bytes = get_pop_bytes(callee_regs_used);
-	off1 += pop_bytes;
-	off2 += pop_bytes;
-	off3 += pop_bytes;
-
-	if (stack_depth) {
-		off1 += 7;
-		off2 += 7;
-		off3 += 7;
-	}
+	u8 *prog = *pprog, *start = *pprog;
+	int offset;
 
 	/*
 	 * rdi - pointer to ctx
@@ -447,8 +422,9 @@ static void emit_bpf_tail_call_indirect(
 	EMIT2(0x89, 0xD2);                        /* mov edx, edx */
 	EMIT3(0x39, 0x56,                         /* cmp dword ptr [rsi + 16], edx */
 	      offsetof(struct bpf_array, map.max_entries));
-#define OFFSET1 (off1 + RETPOLINE_RCX_BPF_JIT_SIZE) /* Number of bytes to jump */
-	EMIT2(X86_JBE, OFFSET1);                  /* jbe out */
+
+	offset = ctx->tail_call_indirect_label - (prog + 2 - start);
+	EMIT2(X86_JBE, offset);                   /* jbe out */
 
 	/*
 	 * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
@@ -456,8 +432,9 @@ static void emit_bpf_tail_call_indirect(
 	 */
 	EMIT2_off32(0x8B, 0x85, tcc_off);         /* mov eax, dword ptr [rbp - tcc_off] */
 	EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT);     /* cmp eax, MAX_TAIL_CALL_CNT */
-#define OFFSET2 (off2 + RETPOLINE_RCX_BPF_JIT_SIZE)
-	EMIT2(X86_JA, OFFSET2);                   /* ja out */
+
+	offset = ctx->tail_call_indirect_label - (prog + 2 - start);
+	EMIT2(X86_JA, offset);                    /* ja out */
 	EMIT3(0x83, 0xC0, 0x01);                  /* add eax, 1 */
 	EMIT2_off32(0x89, 0x85, tcc_off);         /* mov dword ptr [rbp - tcc_off], eax */
 
@@ -470,12 +447,11 @@ static void emit_bpf_tail_call_indirect(
 	 *	goto out;
 	 */
 	EMIT3(0x48, 0x85, 0xC9);                  /* test rcx,rcx */
-#define OFFSET3 (off3 + RETPOLINE_RCX_BPF_JIT_SIZE)
-	EMIT2(X86_JE, OFFSET3);                   /* je out */
 
-	*pprog = prog;
-	pop_callee_regs(pprog, callee_regs_used);
-	prog = *pprog;
+	offset = ctx->tail_call_indirect_label - (prog + 2 - start);
+	EMIT2(X86_JE, offset);                    /* je out */
+
+	pop_callee_regs(&prog, callee_regs_used);
 
 	EMIT1(0x58);                              /* pop rax */
 	if (stack_depth)
@@ -495,38 +471,18 @@ static void emit_bpf_tail_call_indirect(
 	RETPOLINE_RCX_BPF_JIT();
 
 	/* out: */
+	ctx->tail_call_indirect_label = prog - start;
 	*pprog = prog;
 }
 
 static void emit_bpf_tail_call_direct(struct bpf_jit_poke_descriptor *poke,
-				      u8 **pprog, int addr, u8 *image,
-				      bool *callee_regs_used, u32 stack_depth)
+				      u8 **pprog, u8 *ip,
+				      bool *callee_regs_used, u32 stack_depth,
+				      struct jit_context *ctx)
 {
 	int tcc_off = -4 - round_up(stack_depth, 8);
-	u8 *prog = *pprog;
-	int pop_bytes = 0;
-	int off1 = 20;
-	int poke_off;
-
-	/* count the additional bytes used for popping callee regs to stack
-	 * that need to be taken into account for jump offset that is used for
-	 * bailing out from of the tail call when limit is reached
-	 */
-	pop_bytes = get_pop_bytes(callee_regs_used);
-	off1 += pop_bytes;
-
-	/*
-	 * total bytes for:
-	 * - nop5/ jmpq $off
-	 * - pop callee regs
-	 * - sub rsp, $val if depth > 0
-	 * - pop rax
-	 */
-	poke_off = X86_PATCH_SIZE + pop_bytes + 1;
-	if (stack_depth) {
-		poke_off += 7;
-		off1 += 7;
-	}
+	u8 *prog = *pprog, *start = *pprog;
+	int offset;
 
 	/*
 	 * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
@@ -534,28 +490,30 @@ static void emit_bpf_tail_call_direct(st
 	 */
 	EMIT2_off32(0x8B, 0x85, tcc_off);             /* mov eax, dword ptr [rbp - tcc_off] */
 	EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT);         /* cmp eax, MAX_TAIL_CALL_CNT */
-	EMIT2(X86_JA, off1);                          /* ja out */
+
+	offset = ctx->tail_call_direct_label - (prog + 2 - start);
+	EMIT2(X86_JA, offset);                        /* ja out */
 	EMIT3(0x83, 0xC0, 0x01);                      /* add eax, 1 */
 	EMIT2_off32(0x89, 0x85, tcc_off);             /* mov dword ptr [rbp - tcc_off], eax */
 
-	poke->tailcall_bypass = image + (addr - poke_off - X86_PATCH_SIZE);
+	poke->tailcall_bypass = ip + (prog - start);
 	poke->adj_off = X86_TAIL_CALL_OFFSET;
-	poke->tailcall_target = image + (addr - X86_PATCH_SIZE);
+	poke->tailcall_target = ip + ctx->tail_call_direct_label - X86_PATCH_SIZE;
 	poke->bypass_addr = (u8 *)poke->tailcall_target + X86_PATCH_SIZE;
 
 	emit_jump(&prog, (u8 *)poke->tailcall_target + X86_PATCH_SIZE,
 		  poke->tailcall_bypass);
 
-	*pprog = prog;
-	pop_callee_regs(pprog, callee_regs_used);
-	prog = *pprog;
+	pop_callee_regs(&prog, callee_regs_used);
 	EMIT1(0x58);                                  /* pop rax */
 	if (stack_depth)
 		EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8));
 
 	memcpy(prog, x86_nops[5], X86_PATCH_SIZE);
 	prog += X86_PATCH_SIZE;
+
 	/* out: */
+	ctx->tail_call_direct_label = prog - start;
 
 	*pprog = prog;
 }
@@ -1404,13 +1362,16 @@ st:			if (is_imm8(insn->off))
 		case BPF_JMP | BPF_TAIL_CALL:
 			if (imm32)
 				emit_bpf_tail_call_direct(&bpf_prog->aux->poke_tab[imm32 - 1],
-							  &prog, addrs[i], image,
+							  &prog, image + addrs[i - 1],
 							  callee_regs_used,
-							  bpf_prog->aux->stack_depth);
+							  bpf_prog->aux->stack_depth,
+							  ctx);
 			else
 				emit_bpf_tail_call_indirect(&prog,
 							    callee_regs_used,
-							    bpf_prog->aux->stack_depth);
+							    bpf_prog->aux->stack_depth,
+							    image + addrs[i - 1],
+							    ctx);
 			break;
 
 			/* cond jump */



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 16/16] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (14 preceding siblings ...)
  2021-10-26 12:01 ` [PATCH v3 15/16] bpf, x86: Simplify computing label offsets Peter Zijlstra
@ 2021-10-26 12:01 ` Peter Zijlstra
  2021-10-29  8:02   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
  2021-10-26 18:26 ` [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Alexei Starovoitov
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-26 12:01 UTC (permalink / raw)
  To: x86, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, alexei.starovoitov, ndesaulniers, bpf

Current BPF codegen doesn't respect X86_FEATURE_RETPOLINE* flags and
unconditionally emits a thunk call, this is sub-optimal and doesn't
match the regular, compiler generated, code.

Update the i386 JIT to emit code equal to what the compiler emits for
the regular kernel text (IOW. a plain THUNK call).

Update the x86_64 JIT to emit code similar to the result of compiler
and kernel rewrites as according to X86_FEATURE_RETPOLINE* flags.
Inlining RETPOLINE_AMD (lfence; jmp *%reg) and !RETPOLINE (jmp *%reg),
while doing a THUNK call for RETPOLINE.

This removes the hard-coded retpoline thunks and shrinks the generated
code. Leaving a single retpoline thunk definition in the kernel.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/nospec-branch.h |   59 -----------------------------------
 arch/x86/net/bpf_jit_comp.c          |   46 +++++++++++++--------------
 arch/x86/net/bpf_jit_comp32.c        |   22 +++++++++++--
 3 files changed, 41 insertions(+), 86 deletions(-)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -316,63 +316,4 @@ static inline void mds_idle_clear_cpu_bu
 
 #endif /* __ASSEMBLY__ */
 
-/*
- * Below is used in the eBPF JIT compiler and emits the byte sequence
- * for the following assembly:
- *
- * With retpolines configured:
- *
- *    callq do_rop
- *  spec_trap:
- *    pause
- *    lfence
- *    jmp spec_trap
- *  do_rop:
- *    mov %rcx,(%rsp) for x86_64
- *    mov %edx,(%esp) for x86_32
- *    retq
- *
- * Without retpolines configured:
- *
- *    jmp *%rcx for x86_64
- *    jmp *%edx for x86_32
- */
-#ifdef CONFIG_RETPOLINE
-# ifdef CONFIG_X86_64
-#  define RETPOLINE_RCX_BPF_JIT_SIZE	17
-#  define RETPOLINE_RCX_BPF_JIT()				\
-do {								\
-	EMIT1_off32(0xE8, 7);	 /* callq do_rop */		\
-	/* spec_trap: */					\
-	EMIT2(0xF3, 0x90);       /* pause */			\
-	EMIT3(0x0F, 0xAE, 0xE8); /* lfence */			\
-	EMIT2(0xEB, 0xF9);       /* jmp spec_trap */		\
-	/* do_rop: */						\
-	EMIT4(0x48, 0x89, 0x0C, 0x24); /* mov %rcx,(%rsp) */	\
-	EMIT1(0xC3);             /* retq */			\
-} while (0)
-# else /* !CONFIG_X86_64 */
-#  define RETPOLINE_EDX_BPF_JIT()				\
-do {								\
-	EMIT1_off32(0xE8, 7);	 /* call do_rop */		\
-	/* spec_trap: */					\
-	EMIT2(0xF3, 0x90);       /* pause */			\
-	EMIT3(0x0F, 0xAE, 0xE8); /* lfence */			\
-	EMIT2(0xEB, 0xF9);       /* jmp spec_trap */		\
-	/* do_rop: */						\
-	EMIT3(0x89, 0x14, 0x24); /* mov %edx,(%esp) */		\
-	EMIT1(0xC3);             /* ret */			\
-} while (0)
-# endif
-#else /* !CONFIG_RETPOLINE */
-# ifdef CONFIG_X86_64
-#  define RETPOLINE_RCX_BPF_JIT_SIZE	2
-#  define RETPOLINE_RCX_BPF_JIT()				\
-	EMIT2(0xFF, 0xE1);       /* jmp *%rcx */
-# else /* !CONFIG_X86_64 */
-#  define RETPOLINE_EDX_BPF_JIT()				\
-	EMIT2(0xFF, 0xE2)        /* jmp *%edx */
-# endif
-#endif
-
 #endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -387,6 +387,25 @@ int bpf_arch_text_poke(void *ip, enum bp
 	return __bpf_arch_text_poke(ip, t, old_addr, new_addr, true);
 }
 
+#define EMIT_LFENCE()	EMIT3(0x0F, 0xAE, 0xE8)
+
+static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
+{
+	u8 *prog = *pprog;
+
+#ifdef CONFIG_RETPOLINE
+	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) {
+		EMIT_LFENCE();
+		EMIT2(0xFF, 0xE0 + reg);
+	} else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
+		emit_jump(&prog, &__x86_indirect_thunk_array[reg], ip);
+	} else
+#endif
+	EMIT2(0xFF, 0xE0 + reg);
+
+	*pprog = prog;
+}
+
 /*
  * Generate the following code:
  *
@@ -468,7 +487,7 @@ static void emit_bpf_tail_call_indirect(
 	 * rdi == ctx (1st arg)
 	 * rcx == prog->bpf_func + X86_TAIL_CALL_OFFSET
 	 */
-	RETPOLINE_RCX_BPF_JIT();
+	emit_indirect_jump(&prog, 1 /* rcx */, ip + (prog - start));
 
 	/* out: */
 	ctx->tail_call_indirect_label = prog - start;
@@ -1177,8 +1196,7 @@ static int do_jit(struct bpf_prog *bpf_p
 			/* speculation barrier */
 		case BPF_ST | BPF_NOSPEC:
 			if (boot_cpu_has(X86_FEATURE_XMM2))
-				/* Emit 'lfence' */
-				EMIT3(0x0F, 0xAE, 0xE8);
+				EMIT_LFENCE();
 			break;
 
 			/* ST: *(u8*)(dst_reg + off) = imm */
@@ -2077,24 +2095,6 @@ int arch_prepare_bpf_trampoline(struct b
 	return ret;
 }
 
-static int emit_fallback_jump(u8 **pprog)
-{
-	u8 *prog = *pprog;
-	int err = 0;
-
-#ifdef CONFIG_RETPOLINE
-	/* Note that this assumes the the compiler uses external
-	 * thunks for indirect calls. Both clang and GCC use the same
-	 * naming convention for external thunks.
-	 */
-	err = emit_jump(&prog, __x86_indirect_thunk_rdx, prog);
-#else
-	EMIT2(0xFF, 0xE2);	/* jmp rdx */
-#endif
-	*pprog = prog;
-	return err;
-}
-
 static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs)
 {
 	u8 *jg_reloc, *prog = *pprog;
@@ -2116,9 +2116,7 @@ static int emit_bpf_dispatcher(u8 **ppro
 		if (err)
 			return err;
 
-		err = emit_fallback_jump(&prog);	/* jmp thunk/indirect */
-		if (err)
-			return err;
+		emit_indirect_jump(&prog, 2 /* rdx */, prog);
 
 		*pprog = prog;
 		return 0;
--- a/arch/x86/net/bpf_jit_comp32.c
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -15,6 +15,7 @@
 #include <asm/cacheflush.h>
 #include <asm/set_memory.h>
 #include <asm/nospec-branch.h>
+#include <asm/asm-prototypes.h>
 #include <linux/bpf.h>
 
 /*
@@ -1267,6 +1268,21 @@ static void emit_epilogue(u8 **pprog, u3
 	*pprog = prog;
 }
 
+static int emit_jmp_edx(u8 **pprog, u8 *ip)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+
+#ifdef CONFIG_RETPOLINE
+	EMIT1_off32(0xE9, (u8 *)__x86_indirect_thunk_edx - (ip + 5));
+#else
+	EMIT2(0xFF, 0xE2);
+#endif
+	*pprog = prog;
+
+	return cnt;
+}
+
 /*
  * Generate the following code:
  * ... bpf_tail_call(void *ctx, struct bpf_array *array, u64 index) ...
@@ -1280,7 +1296,7 @@ static void emit_epilogue(u8 **pprog, u3
  *   goto *(prog->bpf_func + prologue_size);
  * out:
  */
-static void emit_bpf_tail_call(u8 **pprog)
+static void emit_bpf_tail_call(u8 **pprog, u8 *ip)
 {
 	u8 *prog = *pprog;
 	int cnt = 0;
@@ -1362,7 +1378,7 @@ static void emit_bpf_tail_call(u8 **ppro
 	 * eax == ctx (1st arg)
 	 * edx == prog->bpf_func + prologue_size
 	 */
-	RETPOLINE_EDX_BPF_JIT();
+	cnt += emit_jmp_edx(&prog, ip + cnt);
 
 	if (jmp_label1 == -1)
 		jmp_label1 = cnt;
@@ -2122,7 +2138,7 @@ static int do_jit(struct bpf_prog *bpf_p
 			break;
 		}
 		case BPF_JMP | BPF_TAIL_CALL:
-			emit_bpf_tail_call(&prog);
+			emit_bpf_tail_call(&prog, image + addrs[i - 1]);
 			break;
 
 		/* cond jump */



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (15 preceding siblings ...)
  2021-10-26 12:01 ` [PATCH v3 16/16] bpf,x86: Respect X86_FEATURE_RETPOLINE* Peter Zijlstra
@ 2021-10-26 18:26 ` Alexei Starovoitov
  2021-10-26 18:45   ` Peter Zijlstra
  2021-10-28  5:17 ` Josh Poimboeuf
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Alexei Starovoitov @ 2021-10-26 18:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: X86 ML, Josh Poimboeuf, Andrew Cooper, LKML, Nick Desaulniers, bpf

On Tue, Oct 26, 2021 at 5:05 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Hi,
>
> These patches rewrite the way retpolines are rewritten. Currently objtool emits
> alternative entries for most retpoline calls. However trying to extend that led
> to trouble (ELF files are horrid).
>
> Therefore completely overhaul this and have objtool emit a .retpoline_sites
> section that lists all compiler generated retpoline thunk calls. Then the
> kernel can do with them as it pleases.
>
> Notably it will:
>
>  - rewrite them to indirect instructions for !RETPOLINE
>  - rewrite them to lfence; indirect; for RETPOLINE_AMD,
>    where size allows (boo clang!)
>
> Specifically, the !RETPOLINE case can now also deal with the clang-special
> conditional-indirect-tail-call:
>
>   Jcc __x86_indirect_thunk_\reg.
>
> Finally, also update the x86 BPF jit to catch up to recent times and do these
> same things.
>
> All this should help improve performance by removing an indirection.
>
> Patches can (soon) be found here:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git objtool/core
>
> Changes since v2:
>
>  - rewrite the __x86_indirect_thunk_array[] stuff again
>  - rewrite the retpoline,amd rewrite logic, it now also supports
>    rewriting the Jcc case, if the original instruction is long enough, but
>    more importantly, it's simpler code.
>  - bpf label simplification patch
>  - random assorted cleanups
>  - actually managed to get bpf selftests working

Great.
The patchset didn't go through BPF CI though.
See
https://patchwork.kernel.org/project/netdevbpf/patch/20211026120309.658539311@infradead.org/

It's a merge conflict. The patchset failed to apply to both bpf and
bpf-next trees:

Cmd('git') failed due to: exit code(128)
  cmdline: git am -3
  stdout: 'Applying: objtool: Classify symbols
Patch failed at 0001 objtool: Classify symbols
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".'
  stderr: 'error: sha1 information is lacking or useless
(tools/objtool/check.c).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch'

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic
  2021-10-26 18:26 ` [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Alexei Starovoitov
@ 2021-10-26 18:45   ` Peter Zijlstra
  2021-10-26 20:00     ` Alexei Starovoitov
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-26 18:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: X86 ML, Josh Poimboeuf, Andrew Cooper, LKML, Nick Desaulniers, bpf

On Tue, Oct 26, 2021 at 11:26:57AM -0700, Alexei Starovoitov wrote:

> It's a merge conflict. The patchset failed to apply to both bpf and
> bpf-next trees:

Figures :/ I suspect it relies on tip/objtool/core at the very least and
possibly some of the x86 trees as well.

I can locally merge tip/master with bpf, but getting a CI to do that
might be tricky.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic
  2021-10-26 18:45   ` Peter Zijlstra
@ 2021-10-26 20:00     ` Alexei Starovoitov
  2021-10-26 21:05       ` Peter Zijlstra
  0 siblings, 1 reply; 48+ messages in thread
From: Alexei Starovoitov @ 2021-10-26 20:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: X86 ML, Josh Poimboeuf, Andrew Cooper, LKML, Nick Desaulniers, bpf

On Tue, Oct 26, 2021 at 11:45 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Oct 26, 2021 at 11:26:57AM -0700, Alexei Starovoitov wrote:
>
> > It's a merge conflict. The patchset failed to apply to both bpf and
> > bpf-next trees:
>
> Figures :/ I suspect it relies on tip/objtool/core at the very least and
> possibly some of the x86 trees as well.
>
> I can locally merge tip/master with bpf, but getting a CI to do that
> might be tricky.

We have an ability in CI to supply few additional patches on top bpf/bpf-next
trees, but that's usually done for the cases where we've merged a fix into
one tree, but it's needed in both while bpf->net->linus->net-next->bpf-next
circle is still pending.

Does tip/objtool/core dependency relevant for this set?
Can you rebase the current set on top of bpf-next and send it to the list
just to get CI to run it? We won't be merging it into bpf-next, of course.
I'm mainly interested in seeing all that additional tests passing that
we have in bpf-next.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic
  2021-10-26 20:00     ` Alexei Starovoitov
@ 2021-10-26 21:05       ` Peter Zijlstra
  2021-10-26 21:05         ` Alexei Starovoitov
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-26 21:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: X86 ML, Josh Poimboeuf, Andrew Cooper, LKML, Nick Desaulniers, bpf

On Tue, Oct 26, 2021 at 01:00:04PM -0700, Alexei Starovoitov wrote:
> On Tue, Oct 26, 2021 at 11:45 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Tue, Oct 26, 2021 at 11:26:57AM -0700, Alexei Starovoitov wrote:
> >
> > > It's a merge conflict. The patchset failed to apply to both bpf and
> > > bpf-next trees:
> >
> > Figures :/ I suspect it relies on tip/objtool/core at the very least and
> > possibly some of the x86 trees as well.
> >
> > I can locally merge tip/master with bpf, but getting a CI to do that
> > might be tricky.
> 
> We have an ability in CI to supply few additional patches on top bpf/bpf-next
> trees, but that's usually done for the cases where we've merged a fix into
> one tree, but it's needed in both while bpf->net->linus->net-next->bpf-next
> circle is still pending.
> 
> Does tip/objtool/core dependency relevant for this set?
> Can you rebase the current set on top of bpf-next and send it to the list
> just to get CI to run it? We won't be merging it into bpf-next, of course.
> I'm mainly interested in seeing all that additional tests passing that
> we have in bpf-next.

I should be able to rebase it just to that, let me try that in the am
though, brain is fairly fried atm. Do you really want me to post it to
the list, or is a git repo good enough?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic
  2021-10-26 21:05       ` Peter Zijlstra
@ 2021-10-26 21:05         ` Alexei Starovoitov
  2021-10-27  9:00           ` Peter Zijlstra
  0 siblings, 1 reply; 48+ messages in thread
From: Alexei Starovoitov @ 2021-10-26 21:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: X86 ML, Josh Poimboeuf, Andrew Cooper, LKML, Nick Desaulniers, bpf

On Tue, Oct 26, 2021 at 2:05 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Oct 26, 2021 at 01:00:04PM -0700, Alexei Starovoitov wrote:
> > On Tue, Oct 26, 2021 at 11:45 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Tue, Oct 26, 2021 at 11:26:57AM -0700, Alexei Starovoitov wrote:
> > >
> > > > It's a merge conflict. The patchset failed to apply to both bpf and
> > > > bpf-next trees:
> > >
> > > Figures :/ I suspect it relies on tip/objtool/core at the very least and
> > > possibly some of the x86 trees as well.
> > >
> > > I can locally merge tip/master with bpf, but getting a CI to do that
> > > might be tricky.
> >
> > We have an ability in CI to supply few additional patches on top bpf/bpf-next
> > trees, but that's usually done for the cases where we've merged a fix into
> > one tree, but it's needed in both while bpf->net->linus->net-next->bpf-next
> > circle is still pending.
> >
> > Does tip/objtool/core dependency relevant for this set?
> > Can you rebase the current set on top of bpf-next and send it to the list
> > just to get CI to run it? We won't be merging it into bpf-next, of course.
> > I'm mainly interested in seeing all that additional tests passing that
> > we have in bpf-next.
>
> I should be able to rebase it just to that, let me try that in the am
> though, brain is fairly fried atm. Do you really want me to post it to
> the list, or is a git repo good enough?

Please post it. CI cannot pull it from the repo.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic
  2021-10-26 21:05         ` Alexei Starovoitov
@ 2021-10-27  9:00           ` Peter Zijlstra
  2021-10-27 17:32             ` Alexei Starovoitov
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-27  9:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: X86 ML, Josh Poimboeuf, Andrew Cooper, LKML, Nick Desaulniers, bpf

On Tue, Oct 26, 2021 at 02:05:55PM -0700, Alexei Starovoitov wrote:

> Please post it. CI cannot pull it from the repo.

Done:

  https://lore.kernel.org/bpf/20211027085243.008677168@infradead.org/T/#t

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic
  2021-10-27  9:00           ` Peter Zijlstra
@ 2021-10-27 17:32             ` Alexei Starovoitov
  0 siblings, 0 replies; 48+ messages in thread
From: Alexei Starovoitov @ 2021-10-27 17:32 UTC (permalink / raw)
  To: Peter Zijlstra, Daniel Borkmann, Andrii Nakryiko,
	David S. Miller, Jakub Kicinski
  Cc: X86 ML, Josh Poimboeuf, Andrew Cooper, LKML, Nick Desaulniers, bpf

On Wed, Oct 27, 2021 at 2:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Oct 26, 2021 at 02:05:55PM -0700, Alexei Starovoitov wrote:
>
> > Please post it. CI cannot pull it from the repo.
>
> Done:
>
>   https://lore.kernel.org/bpf/20211027085243.008677168@infradead.org/T/#t

Perfect. Thanks!
Looks like vger got into 'delete all users' mode.
Looks like my and many other emails were automatically unsubscribed
from many mailing lists.
bpf@vger, netdev@vger, linux-kernel@vger user count looks very low
compared to a few weeks ago.
I doubt all these users decided to unsubscribe themselves.

Anyway looks like BPF CI doesn't see the patchset yet for some reason
(we're debugging),
but I've tested your patchset manually the same way CI would do
and reviewed most of the patches.
Please add:
Acked-by: Alexei Starovoitov <ast@kernel.org>
to bpf patches 16 and 17
and
Tested-by: Alexei Starovoitov <ast@kernel.org>
for the whole set.

Thanks!

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 10/16] x86/alternative: Implement .retpoline_sites support
  2021-10-26 12:01 ` [PATCH v3 10/16] x86/alternative: Implement .retpoline_sites support Peter Zijlstra
@ 2021-10-27 17:38   ` Borislav Petkov
  2021-10-27 19:10     ` Peter Zijlstra
  2021-10-29  8:02   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
  1 sibling, 1 reply; 48+ messages in thread
From: Borislav Petkov @ 2021-10-27 17:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, alexei.starovoitov,
	ndesaulniers, bpf

On Tue, Oct 26, 2021 at 02:01:42PM +0200, Peter Zijlstra wrote:
> +static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
> +{
> +	retpoline_thunk_t *target;
> +	int reg, i = 0;
> +
> +	target = addr + insn->length + insn->immediate.value;
> +	reg = target - __x86_indirect_thunk_array;
> +
> +	if (WARN_ON_ONCE(reg & ~0xf))
> +		return -1;
> +
> +	/* If anyone ever does: CALL/JMP *%rsp, we're in deep trouble. */
> +	BUG_ON(reg == 4);
> +
> +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
> +		return -1;

I wanna say this should be the first thing being checked on function
entry but I get the feeling you'll be looking at other X86_FEATURE bits
in future patches... /me goes into the future...

yap, you do. Lemme look at the whole thing first then.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 10/16] x86/alternative: Implement .retpoline_sites support
  2021-10-27 17:38   ` Borislav Petkov
@ 2021-10-27 19:10     ` Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2021-10-27 19:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, alexei.starovoitov,
	ndesaulniers, bpf

On Wed, Oct 27, 2021 at 07:38:59PM +0200, Borislav Petkov wrote:
> On Tue, Oct 26, 2021 at 02:01:42PM +0200, Peter Zijlstra wrote:
> > +static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
> > +{
> > +	retpoline_thunk_t *target;
> > +	int reg, i = 0;
> > +
> > +	target = addr + insn->length + insn->immediate.value;
> > +	reg = target - __x86_indirect_thunk_array;
> > +
> > +	if (WARN_ON_ONCE(reg & ~0xf))
> > +		return -1;
> > +
> > +	/* If anyone ever does: CALL/JMP *%rsp, we're in deep trouble. */
> > +	BUG_ON(reg == 4);
> > +
> > +	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
> > +		return -1;
> 
> I wanna say this should be the first thing being checked on function
> entry but I get the feeling you'll be looking at other X86_FEATURE bits
> in future patches... /me goes into the future...
> 
> yap, you do. Lemme look at the whole thing first then.

I wanted the sanity checks done unconditionally.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 01/16] objtool: Classify symbols
  2021-10-26 12:01 ` [PATCH v3 01/16] objtool: Classify symbols Peter Zijlstra
@ 2021-10-28  3:58   ` Josh Poimboeuf
  2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 48+ messages in thread
From: Josh Poimboeuf @ 2021-10-28  3:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers, bpf

On Tue, Oct 26, 2021 at 02:01:33PM +0200, Peter Zijlstra wrote:
> @@ -55,7 +55,10 @@ struct symbol {
>  	unsigned int len;
>  	struct symbol *pfunc, *cfunc, *alias;
>  	bool uaccess_safe;
> -	bool static_call_tramp;
> +	u8 static_call_tramp : 1;
> +	u8 retpoline_thunk   : 1;
> +	u8 fentry            : 1;
> +	u8 kcov              : 1;
>  	struct list_head pv_target;

Might as well convert uaccess_safe to a bitfield?

-- 
Josh


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 02/16] objtool: Explicitly avoid self modifying code in .altinstr_replacement
  2021-10-26 12:01 ` [PATCH v3 02/16] objtool: Explicitly avoid self modifying code in .altinstr_replacement Peter Zijlstra
@ 2021-10-28  4:01   ` Josh Poimboeuf
  2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 48+ messages in thread
From: Josh Poimboeuf @ 2021-10-28  4:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers, bpf

On Tue, Oct 26, 2021 at 02:01:34PM +0200, Peter Zijlstra wrote:
> +	/*
> +	 * Alternative replacement code is just template code which is
> +	 * sometimes copied to the original instruction, For now, don't

s/,/./

> +	 * annotate it. (In the future we might consider annotating the
> +	 * original instruction if/when it ever makes sense to do so.).

s/\.$//

> +	 */

-- 
Josh


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (16 preceding siblings ...)
  2021-10-26 18:26 ` [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Alexei Starovoitov
@ 2021-10-28  5:17 ` Josh Poimboeuf
  2021-10-28 17:43 ` Borislav Petkov
  2021-10-29 12:43 ` Miroslav Benes
  19 siblings, 0 replies; 48+ messages in thread
From: Josh Poimboeuf @ 2021-10-28  5:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, andrew.cooper3, linux-kernel, alexei.starovoitov, ndesaulniers, bpf

On Tue, Oct 26, 2021 at 02:01:32PM +0200, Peter Zijlstra wrote:
> Hi,
> 
> These patches rewrite the way retpolines are rewritten. Currently objtool emits
> alternative entries for most retpoline calls. However trying to extend that led
> to trouble (ELF files are horrid).
> 
> Therefore completely overhaul this and have objtool emit a .retpoline_sites
> section that lists all compiler generated retpoline thunk calls. Then the
> kernel can do with them as it pleases.
> 
> Notably it will:
> 
>  - rewrite them to indirect instructions for !RETPOLINE
>  - rewrite them to lfence; indirect; for RETPOLINE_AMD,
>    where size allows (boo clang!)
> 
> Specifically, the !RETPOLINE case can now also deal with the clang-special
> conditional-indirect-tail-call:
> 
>   Jcc __x86_indirect_thunk_\reg.
> 
> Finally, also update the x86 BPF jit to catch up to recent times and do these
> same things.
> 
> All this should help improve performance by removing an indirection.
> 
> Patches can (soon) be found here:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git objtool/core
> 
> Changes since v2:
> 
>  - rewrite the __x86_indirect_thunk_array[] stuff again
>  - rewrite the retpoline,amd rewrite logic, it now also supports
>    rewriting the Jcc case, if the original instruction is long enough, but
>    more importantly, it's simpler code.
>  - bpf label simplification patch
>  - random assorted cleanups
>  - actually managed to get bpf selftests working

Good stuff!

Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>

-- 
Josh


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 11/16] x86/alternative: Handle Jcc __x86_indirect_thunk_\reg
  2021-10-26 12:01 ` [PATCH v3 11/16] x86/alternative: Handle Jcc __x86_indirect_thunk_\reg Peter Zijlstra
@ 2021-10-28  9:29   ` Borislav Petkov
  2021-10-29  8:02   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2021-10-28  9:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, alexei.starovoitov,
	ndesaulniers, bpf

On Tue, Oct 26, 2021 at 02:01:43PM +0200, Peter Zijlstra wrote:
> +	op = insn->opcode.bytes[0];
> +
> +	/*
> +	 * Convert:
> +	 *
> +	 *   Jcc.d32 __x86_indirect_thunk_\reg
> +	 *
> +	 * into:
> +	 *
> +	 *   Jncc.d8 1f
> +	 *   JMP *%\reg
> +	 *   NOP
> +	 * 1:
> +	 */

Let's explain the second part of the test better:

	/* Jcc opcodes are in the range 0x80-0x8f */

Yeah, you have that range check below but still.

> +	if (op == 0x0f && (insn->opcode.bytes[1] & 0xf0) == 0x80) {
> +		cc = insn->opcode.bytes[1] & 0xf;
> +		cc ^= 1; /* invert condition */
> +
> +		bytes[i++] = 0x70 + cc; /* Jcc.d8 */
> +		bytes[i++] = insn->length - 2;

maybe put at the end here: /* 2 == sizeof(Jcc.d8) */

to have it explicit what that 2 means.

But yeah, looks good.

Thx.

> +
> +		op = JMP32_INSN_OPCODE;
> +	}
> +
> +	ret = emit_indirect(op, reg, bytes + i);
> +	if (ret < 0)
> +		return ret;
> +	i += ret;
>  
>  	for (; i < insn->length;)
>  		bytes[i++] = BYTES_NOP1;
> @@ -443,6 +469,10 @@ void __init_or_module noinline apply_ret
>  		case JMP32_INSN_OPCODE:
>  			break;
>  
> +		case 0x0f: /* escape */
> +			if (op2 >= 0x80 && op2 <= 0x8f)
> +				break;
> +			fallthrough;
>  		default:
>  			WARN_ON_ONCE(1);
>  			continue;
> 
> 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (17 preceding siblings ...)
  2021-10-28  5:17 ` Josh Poimboeuf
@ 2021-10-28 17:43 ` Borislav Petkov
  2021-10-29 12:43 ` Miroslav Benes
  19 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2021-10-28 17:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, alexei.starovoitov,
	ndesaulniers, bpf

On Tue, Oct 26, 2021 at 02:01:32PM +0200, Peter Zijlstra wrote:
> Hi,
> 
> These patches rewrite the way retpolines are rewritten. Currently objtool emits
> alternative entries for most retpoline calls. However trying to extend that led
> to trouble (ELF files are horrid).
> 
> Therefore completely overhaul this and have objtool emit a .retpoline_sites
> section that lists all compiler generated retpoline thunk calls. Then the
> kernel can do with them as it pleases.
> 
> Notably it will:
> 
>  - rewrite them to indirect instructions for !RETPOLINE
>  - rewrite them to lfence; indirect; for RETPOLINE_AMD,
>    where size allows (boo clang!)
> 
> Specifically, the !RETPOLINE case can now also deal with the clang-special
> conditional-indirect-tail-call:
> 
>   Jcc __x86_indirect_thunk_\reg.
> 
> Finally, also update the x86 BPF jit to catch up to recent times and do these
> same things.
> 
> All this should help improve performance by removing an indirection.
> 
> Patches can (soon) be found here:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git objtool/core
> 
> Changes since v2:
> 
>  - rewrite the __x86_indirect_thunk_array[] stuff again
>  - rewrite the retpoline,amd rewrite logic, it now also supports
>    rewriting the Jcc case, if the original instruction is long enough, but
>    more importantly, it's simpler code.
>  - bpf label simplification patch
>  - random assorted cleanups
>  - actually managed to get bpf selftests working
> 
> ---
>  arch/um/kernel/um_arch.c                |   4 +
>  arch/x86/include/asm/GEN-for-each-reg.h |  14 ++-
>  arch/x86/include/asm/alternative.h      |   1 +
>  arch/x86/include/asm/asm-prototypes.h   |  18 ---
>  arch/x86/include/asm/nospec-branch.h    |  72 ++---------
>  arch/x86/kernel/alternative.c           | 189 ++++++++++++++++++++++++++++-
>  arch/x86/kernel/cpu/bugs.c              |   7 --
>  arch/x86/kernel/module.c                |   9 +-
>  arch/x86/kernel/vmlinux.lds.S           |  14 +++
>  arch/x86/lib/retpoline.S                |  56 ++-------
>  arch/x86/net/bpf_jit_comp.c             | 160 +++++++++---------------
>  arch/x86/net/bpf_jit_comp32.c           |  22 +++-
>  tools/objtool/arch/x86/decode.c         | 120 ------------------
>  tools/objtool/check.c                   | 208 ++++++++++++++++++++++----------
>  tools/objtool/elf.c                     |  84 -------------
>  tools/objtool/include/objtool/check.h   |   1 -
>  tools/objtool/include/objtool/elf.h     |   6 +-
>  tools/objtool/special.c                 |   8 --
>  18 files changed, 472 insertions(+), 521 deletions(-)

Ok, this all looks real nice, thx!

Reviewed-by: Borislav Petkov <bp@suse.de> 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip: objtool/core] bpf,x86: Respect X86_FEATURE_RETPOLINE*
  2021-10-26 12:01 ` [PATCH v3 16/16] bpf,x86: Respect X86_FEATURE_RETPOLINE* Peter Zijlstra
@ 2021-10-29  8:02   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-10-29  8:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Borislav Petkov, Alexei Starovoitov, Josh Poimboeuf, x86,
	linux-kernel

The following commit has been merged into the objtool/core branch of tip:

Commit-ID:     87c87ecd00c54ecd677798cb49ef27329e0fab41
Gitweb:        https://git.kernel.org/tip/87c87ecd00c54ecd677798cb49ef27329e0fab41
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 26 Oct 2021 14:01:48 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 28 Oct 2021 23:25:29 +02:00

bpf,x86: Respect X86_FEATURE_RETPOLINE*

Current BPF codegen doesn't respect X86_FEATURE_RETPOLINE* flags and
unconditionally emits a thunk call, this is sub-optimal and doesn't
match the regular, compiler generated, code.

Update the i386 JIT to emit code equal to what the compiler emits for
the regular kernel text (IOW. a plain THUNK call).

Update the x86_64 JIT to emit code similar to the result of compiler
and kernel rewrites as according to X86_FEATURE_RETPOLINE* flags.
Inlining RETPOLINE_AMD (lfence; jmp *%reg) and !RETPOLINE (jmp *%reg),
while doing a THUNK call for RETPOLINE.

This removes the hard-coded retpoline thunks and shrinks the generated
code. Leaving a single retpoline thunk definition in the kernel.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.614772675@infradead.org
---
 arch/x86/include/asm/nospec-branch.h | 59 +---------------------------
 arch/x86/net/bpf_jit_comp.c          | 46 ++++++++++-----------
 arch/x86/net/bpf_jit_comp32.c        | 22 ++++++++--
 3 files changed, 41 insertions(+), 86 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index e22aedb..cc74dc5 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -316,63 +316,4 @@ static inline void mds_idle_clear_cpu_buffers(void)
 
 #endif /* __ASSEMBLY__ */
 
-/*
- * Below is used in the eBPF JIT compiler and emits the byte sequence
- * for the following assembly:
- *
- * With retpolines configured:
- *
- *    callq do_rop
- *  spec_trap:
- *    pause
- *    lfence
- *    jmp spec_trap
- *  do_rop:
- *    mov %rcx,(%rsp) for x86_64
- *    mov %edx,(%esp) for x86_32
- *    retq
- *
- * Without retpolines configured:
- *
- *    jmp *%rcx for x86_64
- *    jmp *%edx for x86_32
- */
-#ifdef CONFIG_RETPOLINE
-# ifdef CONFIG_X86_64
-#  define RETPOLINE_RCX_BPF_JIT_SIZE	17
-#  define RETPOLINE_RCX_BPF_JIT()				\
-do {								\
-	EMIT1_off32(0xE8, 7);	 /* callq do_rop */		\
-	/* spec_trap: */					\
-	EMIT2(0xF3, 0x90);       /* pause */			\
-	EMIT3(0x0F, 0xAE, 0xE8); /* lfence */			\
-	EMIT2(0xEB, 0xF9);       /* jmp spec_trap */		\
-	/* do_rop: */						\
-	EMIT4(0x48, 0x89, 0x0C, 0x24); /* mov %rcx,(%rsp) */	\
-	EMIT1(0xC3);             /* retq */			\
-} while (0)
-# else /* !CONFIG_X86_64 */
-#  define RETPOLINE_EDX_BPF_JIT()				\
-do {								\
-	EMIT1_off32(0xE8, 7);	 /* call do_rop */		\
-	/* spec_trap: */					\
-	EMIT2(0xF3, 0x90);       /* pause */			\
-	EMIT3(0x0F, 0xAE, 0xE8); /* lfence */			\
-	EMIT2(0xEB, 0xF9);       /* jmp spec_trap */		\
-	/* do_rop: */						\
-	EMIT3(0x89, 0x14, 0x24); /* mov %edx,(%esp) */		\
-	EMIT1(0xC3);             /* ret */			\
-} while (0)
-# endif
-#else /* !CONFIG_RETPOLINE */
-# ifdef CONFIG_X86_64
-#  define RETPOLINE_RCX_BPF_JIT_SIZE	2
-#  define RETPOLINE_RCX_BPF_JIT()				\
-	EMIT2(0xFF, 0xE1);       /* jmp *%rcx */
-# else /* !CONFIG_X86_64 */
-#  define RETPOLINE_EDX_BPF_JIT()				\
-	EMIT2(0xFF, 0xE2)        /* jmp *%edx */
-# endif
-#endif
-
 #endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index b5c5fb4..39c8025 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -387,6 +387,25 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 	return __bpf_arch_text_poke(ip, t, old_addr, new_addr, true);
 }
 
+#define EMIT_LFENCE()	EMIT3(0x0F, 0xAE, 0xE8)
+
+static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
+{
+	u8 *prog = *pprog;
+
+#ifdef CONFIG_RETPOLINE
+	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) {
+		EMIT_LFENCE();
+		EMIT2(0xFF, 0xE0 + reg);
+	} else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
+		emit_jump(&prog, &__x86_indirect_thunk_array[reg], ip);
+	} else
+#endif
+	EMIT2(0xFF, 0xE0 + reg);
+
+	*pprog = prog;
+}
+
 /*
  * Generate the following code:
  *
@@ -468,7 +487,7 @@ static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
 	 * rdi == ctx (1st arg)
 	 * rcx == prog->bpf_func + X86_TAIL_CALL_OFFSET
 	 */
-	RETPOLINE_RCX_BPF_JIT();
+	emit_indirect_jump(&prog, 1 /* rcx */, ip + (prog - start));
 
 	/* out: */
 	ctx->tail_call_indirect_label = prog - start;
@@ -1179,8 +1198,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 			/* speculation barrier */
 		case BPF_ST | BPF_NOSPEC:
 			if (boot_cpu_has(X86_FEATURE_XMM2))
-				/* Emit 'lfence' */
-				EMIT3(0x0F, 0xAE, 0xE8);
+				EMIT_LFENCE();
 			break;
 
 			/* ST: *(u8*)(dst_reg + off) = imm */
@@ -2084,24 +2102,6 @@ cleanup:
 	return ret;
 }
 
-static int emit_fallback_jump(u8 **pprog)
-{
-	u8 *prog = *pprog;
-	int err = 0;
-
-#ifdef CONFIG_RETPOLINE
-	/* Note that this assumes the the compiler uses external
-	 * thunks for indirect calls. Both clang and GCC use the same
-	 * naming convention for external thunks.
-	 */
-	err = emit_jump(&prog, __x86_indirect_thunk_rdx, prog);
-#else
-	EMIT2(0xFF, 0xE2);	/* jmp rdx */
-#endif
-	*pprog = prog;
-	return err;
-}
-
 static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs)
 {
 	u8 *jg_reloc, *prog = *pprog;
@@ -2123,9 +2123,7 @@ static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs)
 		if (err)
 			return err;
 
-		err = emit_fallback_jump(&prog);	/* jmp thunk/indirect */
-		if (err)
-			return err;
+		emit_indirect_jump(&prog, 2 /* rdx */, prog);
 
 		*pprog = prog;
 		return 0;
diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
index 3bfda5f..da9b7cf 100644
--- a/arch/x86/net/bpf_jit_comp32.c
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -15,6 +15,7 @@
 #include <asm/cacheflush.h>
 #include <asm/set_memory.h>
 #include <asm/nospec-branch.h>
+#include <asm/asm-prototypes.h>
 #include <linux/bpf.h>
 
 /*
@@ -1267,6 +1268,21 @@ static void emit_epilogue(u8 **pprog, u32 stack_depth)
 	*pprog = prog;
 }
 
+static int emit_jmp_edx(u8 **pprog, u8 *ip)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+
+#ifdef CONFIG_RETPOLINE
+	EMIT1_off32(0xE9, (u8 *)__x86_indirect_thunk_edx - (ip + 5));
+#else
+	EMIT2(0xFF, 0xE2);
+#endif
+	*pprog = prog;
+
+	return cnt;
+}
+
 /*
  * Generate the following code:
  * ... bpf_tail_call(void *ctx, struct bpf_array *array, u64 index) ...
@@ -1280,7 +1296,7 @@ static void emit_epilogue(u8 **pprog, u32 stack_depth)
  *   goto *(prog->bpf_func + prologue_size);
  * out:
  */
-static void emit_bpf_tail_call(u8 **pprog)
+static void emit_bpf_tail_call(u8 **pprog, u8 *ip)
 {
 	u8 *prog = *pprog;
 	int cnt = 0;
@@ -1362,7 +1378,7 @@ static void emit_bpf_tail_call(u8 **pprog)
 	 * eax == ctx (1st arg)
 	 * edx == prog->bpf_func + prologue_size
 	 */
-	RETPOLINE_EDX_BPF_JIT();
+	cnt += emit_jmp_edx(&prog, ip + cnt);
 
 	if (jmp_label1 == -1)
 		jmp_label1 = cnt;
@@ -2122,7 +2138,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 			break;
 		}
 		case BPF_JMP | BPF_TAIL_CALL:
-			emit_bpf_tail_call(&prog);
+			emit_bpf_tail_call(&prog, image + addrs[i - 1]);
 			break;
 
 		/* cond jump */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip: objtool/core] bpf,x86: Simplify computing label offsets
  2021-10-26 12:01 ` [PATCH v3 15/16] bpf, x86: Simplify computing label offsets Peter Zijlstra
@ 2021-10-29  8:02   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-10-29  8:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Borislav Petkov, Alexei Starovoitov, Josh Poimboeuf, x86,
	linux-kernel

The following commit has been merged into the objtool/core branch of tip:

Commit-ID:     dceba0817ca329868a15e2e1dd46eb6340b69206
Gitweb:        https://git.kernel.org/tip/dceba0817ca329868a15e2e1dd46eb6340b69206
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 26 Oct 2021 14:01:47 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 28 Oct 2021 23:25:29 +02:00

bpf,x86: Simplify computing label offsets

Take an idea from the 32bit JIT, which uses the multi-pass nature of
the JIT to compute the instruction offsets on a prior pass in order to
compute the relative jump offsets on a later pass.

Application to the x86_64 JIT is slightly more involved because the
offsets depend on program variables (such as callee_regs_used and
stack_depth) and hence the computed offsets need to be kept in the
context of the JIT.

This removes, IMO quite fragile, code that hard-codes the offsets and
tries to compute the length of variable parts of it.

Convert both emit_bpf_tail_call_*() functions which have an out: label
at the end. Additionally emit_bpt_tail_call_direct() also has a poke
table entry, for which it computes the offset from the end (and thus
already relies on the previous pass to have computed addrs[i]), also
convert this to be a forward based offset.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.552304864@infradead.org
---
 arch/x86/net/bpf_jit_comp.c | 123 +++++++++++------------------------
 1 file changed, 42 insertions(+), 81 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 7c03af6..b5c5fb4 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -224,6 +224,14 @@ static void jit_fill_hole(void *area, unsigned int size)
 
 struct jit_context {
 	int cleanup_addr; /* Epilogue code offset */
+
+	/*
+	 * Program specific offsets of labels in the code; these rely on the
+	 * JIT doing at least 2 passes, recording the position on the first
+	 * pass, only to generate the correct offset on the second pass.
+	 */
+	int tail_call_direct_label;
+	int tail_call_indirect_label;
 };
 
 /* Maximum number of bytes emitted while JITing one eBPF insn */
@@ -379,22 +387,6 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 	return __bpf_arch_text_poke(ip, t, old_addr, new_addr, true);
 }
 
-static int get_pop_bytes(bool *callee_regs_used)
-{
-	int bytes = 0;
-
-	if (callee_regs_used[3])
-		bytes += 2;
-	if (callee_regs_used[2])
-		bytes += 2;
-	if (callee_regs_used[1])
-		bytes += 2;
-	if (callee_regs_used[0])
-		bytes += 1;
-
-	return bytes;
-}
-
 /*
  * Generate the following code:
  *
@@ -410,29 +402,12 @@ static int get_pop_bytes(bool *callee_regs_used)
  * out:
  */
 static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
-					u32 stack_depth)
+					u32 stack_depth, u8 *ip,
+					struct jit_context *ctx)
 {
 	int tcc_off = -4 - round_up(stack_depth, 8);
-	u8 *prog = *pprog;
-	int pop_bytes = 0;
-	int off1 = 42;
-	int off2 = 31;
-	int off3 = 9;
-
-	/* count the additional bytes used for popping callee regs from stack
-	 * that need to be taken into account for each of the offsets that
-	 * are used for bailing out of the tail call
-	 */
-	pop_bytes = get_pop_bytes(callee_regs_used);
-	off1 += pop_bytes;
-	off2 += pop_bytes;
-	off3 += pop_bytes;
-
-	if (stack_depth) {
-		off1 += 7;
-		off2 += 7;
-		off3 += 7;
-	}
+	u8 *prog = *pprog, *start = *pprog;
+	int offset;
 
 	/*
 	 * rdi - pointer to ctx
@@ -447,8 +422,9 @@ static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
 	EMIT2(0x89, 0xD2);                        /* mov edx, edx */
 	EMIT3(0x39, 0x56,                         /* cmp dword ptr [rsi + 16], edx */
 	      offsetof(struct bpf_array, map.max_entries));
-#define OFFSET1 (off1 + RETPOLINE_RCX_BPF_JIT_SIZE) /* Number of bytes to jump */
-	EMIT2(X86_JBE, OFFSET1);                  /* jbe out */
+
+	offset = ctx->tail_call_indirect_label - (prog + 2 - start);
+	EMIT2(X86_JBE, offset);                   /* jbe out */
 
 	/*
 	 * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
@@ -456,8 +432,9 @@ static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
 	 */
 	EMIT2_off32(0x8B, 0x85, tcc_off);         /* mov eax, dword ptr [rbp - tcc_off] */
 	EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT);     /* cmp eax, MAX_TAIL_CALL_CNT */
-#define OFFSET2 (off2 + RETPOLINE_RCX_BPF_JIT_SIZE)
-	EMIT2(X86_JA, OFFSET2);                   /* ja out */
+
+	offset = ctx->tail_call_indirect_label - (prog + 2 - start);
+	EMIT2(X86_JA, offset);                    /* ja out */
 	EMIT3(0x83, 0xC0, 0x01);                  /* add eax, 1 */
 	EMIT2_off32(0x89, 0x85, tcc_off);         /* mov dword ptr [rbp - tcc_off], eax */
 
@@ -470,12 +447,11 @@ static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
 	 *	goto out;
 	 */
 	EMIT3(0x48, 0x85, 0xC9);                  /* test rcx,rcx */
-#define OFFSET3 (off3 + RETPOLINE_RCX_BPF_JIT_SIZE)
-	EMIT2(X86_JE, OFFSET3);                   /* je out */
 
-	*pprog = prog;
-	pop_callee_regs(pprog, callee_regs_used);
-	prog = *pprog;
+	offset = ctx->tail_call_indirect_label - (prog + 2 - start);
+	EMIT2(X86_JE, offset);                    /* je out */
+
+	pop_callee_regs(&prog, callee_regs_used);
 
 	EMIT1(0x58);                              /* pop rax */
 	if (stack_depth)
@@ -495,38 +471,18 @@ static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
 	RETPOLINE_RCX_BPF_JIT();
 
 	/* out: */
+	ctx->tail_call_indirect_label = prog - start;
 	*pprog = prog;
 }
 
 static void emit_bpf_tail_call_direct(struct bpf_jit_poke_descriptor *poke,
-				      u8 **pprog, int addr, u8 *image,
-				      bool *callee_regs_used, u32 stack_depth)
+				      u8 **pprog, u8 *ip,
+				      bool *callee_regs_used, u32 stack_depth,
+				      struct jit_context *ctx)
 {
 	int tcc_off = -4 - round_up(stack_depth, 8);
-	u8 *prog = *pprog;
-	int pop_bytes = 0;
-	int off1 = 20;
-	int poke_off;
-
-	/* count the additional bytes used for popping callee regs to stack
-	 * that need to be taken into account for jump offset that is used for
-	 * bailing out from of the tail call when limit is reached
-	 */
-	pop_bytes = get_pop_bytes(callee_regs_used);
-	off1 += pop_bytes;
-
-	/*
-	 * total bytes for:
-	 * - nop5/ jmpq $off
-	 * - pop callee regs
-	 * - sub rsp, $val if depth > 0
-	 * - pop rax
-	 */
-	poke_off = X86_PATCH_SIZE + pop_bytes + 1;
-	if (stack_depth) {
-		poke_off += 7;
-		off1 += 7;
-	}
+	u8 *prog = *pprog, *start = *pprog;
+	int offset;
 
 	/*
 	 * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
@@ -534,28 +490,30 @@ static void emit_bpf_tail_call_direct(struct bpf_jit_poke_descriptor *poke,
 	 */
 	EMIT2_off32(0x8B, 0x85, tcc_off);             /* mov eax, dword ptr [rbp - tcc_off] */
 	EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT);         /* cmp eax, MAX_TAIL_CALL_CNT */
-	EMIT2(X86_JA, off1);                          /* ja out */
+
+	offset = ctx->tail_call_direct_label - (prog + 2 - start);
+	EMIT2(X86_JA, offset);                        /* ja out */
 	EMIT3(0x83, 0xC0, 0x01);                      /* add eax, 1 */
 	EMIT2_off32(0x89, 0x85, tcc_off);             /* mov dword ptr [rbp - tcc_off], eax */
 
-	poke->tailcall_bypass = image + (addr - poke_off - X86_PATCH_SIZE);
+	poke->tailcall_bypass = ip + (prog - start);
 	poke->adj_off = X86_TAIL_CALL_OFFSET;
-	poke->tailcall_target = image + (addr - X86_PATCH_SIZE);
+	poke->tailcall_target = ip + ctx->tail_call_direct_label - X86_PATCH_SIZE;
 	poke->bypass_addr = (u8 *)poke->tailcall_target + X86_PATCH_SIZE;
 
 	emit_jump(&prog, (u8 *)poke->tailcall_target + X86_PATCH_SIZE,
 		  poke->tailcall_bypass);
 
-	*pprog = prog;
-	pop_callee_regs(pprog, callee_regs_used);
-	prog = *pprog;
+	pop_callee_regs(&prog, callee_regs_used);
 	EMIT1(0x58);                                  /* pop rax */
 	if (stack_depth)
 		EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8));
 
 	memcpy(prog, x86_nops[5], X86_PATCH_SIZE);
 	prog += X86_PATCH_SIZE;
+
 	/* out: */
+	ctx->tail_call_direct_label = prog - start;
 
 	*pprog = prog;
 }
@@ -1411,13 +1369,16 @@ st:			if (is_imm8(insn->off))
 		case BPF_JMP | BPF_TAIL_CALL:
 			if (imm32)
 				emit_bpf_tail_call_direct(&bpf_prog->aux->poke_tab[imm32 - 1],
-							  &prog, addrs[i], image,
+							  &prog, image + addrs[i - 1],
 							  callee_regs_used,
-							  bpf_prog->aux->stack_depth);
+							  bpf_prog->aux->stack_depth,
+							  ctx);
 			else
 				emit_bpf_tail_call_indirect(&prog,
 							    callee_regs_used,
-							    bpf_prog->aux->stack_depth);
+							    bpf_prog->aux->stack_depth,
+							    image + addrs[i - 1],
+							    ctx);
 			break;
 
 			/* cond jump */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip: objtool/core] x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
  2021-10-26 12:01 ` [PATCH v3 14/16] x86,bugs: Unconditionally allow spectre_v2=retpoline,amd Peter Zijlstra
@ 2021-10-29  8:02   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-10-29  8:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Borislav Petkov, Josh Poimboeuf, Alexei Starovoitov, x86,
	linux-kernel

The following commit has been merged into the objtool/core branch of tip:

Commit-ID:     f8a66d608a3e471e1202778c2a36cbdc96bae73b
Gitweb:        https://git.kernel.org/tip/f8a66d608a3e471e1202778c2a36cbdc96bae73b
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 26 Oct 2021 14:01:46 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 28 Oct 2021 23:25:29 +02:00

x86,bugs: Unconditionally allow spectre_v2=retpoline,amd

Currently Linux prevents usage of retpoline,amd on !AMD hardware, this
is unfriendly and gets in the way of testing. Remove this restriction.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.487348118@infradead.org
---
 arch/x86/kernel/cpu/bugs.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index ecfca3b..ba43597 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -882,13 +882,6 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
 		return SPECTRE_V2_CMD_AUTO;
 	}
 
-	if (cmd == SPECTRE_V2_CMD_RETPOLINE_AMD &&
-	    boot_cpu_data.x86_vendor != X86_VENDOR_HYGON &&
-	    boot_cpu_data.x86_vendor != X86_VENDOR_AMD) {
-		pr_err("retpoline,amd selected but CPU is not AMD. Switching to AUTO select\n");
-		return SPECTRE_V2_CMD_AUTO;
-	}
-
 	spec_v2_print_cond(mitigation_options[i].option,
 			   mitigation_options[i].secure);
 	return cmd;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip: objtool/core] x86/alternative: Add debug prints to apply_retpolines()
  2021-10-26 12:01 ` [PATCH v3 13/16] x86/alternative: Add debug prints to apply_retpolines() Peter Zijlstra
@ 2021-10-29  8:02   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-10-29  8:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Borislav Petkov, Josh Poimboeuf, Alexei Starovoitov, x86,
	linux-kernel

The following commit has been merged into the objtool/core branch of tip:

Commit-ID:     d4b5a5c993009ffeb5febe3b701da3faab6adb96
Gitweb:        https://git.kernel.org/tip/d4b5a5c993009ffeb5febe3b701da3faab6adb96
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 26 Oct 2021 14:01:45 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 28 Oct 2021 23:25:28 +02:00

x86/alternative: Add debug prints to apply_retpolines()

Make sure we can see the text changes when booting with
'debug-alternative'.

Example output:

 [ ] SMP alternatives: retpoline at: __traceiter_initcall_level+0x1f/0x30 (ffffffff8100066f) len: 5 to: __x86_indirect_thunk_rax+0x0/0x20
 [ ] SMP alternatives: ffffffff82603e58: [2:5) optimized NOPs: ff d0 0f 1f 00
 [ ] SMP alternatives: ffffffff8100066f: orig: e8 cc 30 00 01
 [ ] SMP alternatives: ffffffff8100066f: repl: ff d0 0f 1f 00

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.422273830@infradead.org
---
 arch/x86/kernel/alternative.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index c89824c..23fb4d5 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -492,9 +492,15 @@ void __init_or_module noinline apply_retpolines(s32 *start, s32 *end)
 			continue;
 		}
 
+		DPRINTK("retpoline at: %pS (%px) len: %d to: %pS",
+			addr, addr, insn.length,
+			addr + insn.length + insn.immediate.value);
+
 		len = patch_retpoline(addr, &insn, bytes);
 		if (len == insn.length) {
 			optimize_nops(bytes, len);
+			DUMP_BYTES(((u8*)addr),  len, "%px: orig: ", addr);
+			DUMP_BYTES(((u8*)bytes), len, "%px: repl: ", addr);
 			text_poke_early(addr, bytes, len);
 		}
 	}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip: objtool/core] x86/alternative: Try inline spectre_v2=retpoline,amd
  2021-10-26 12:01 ` [PATCH v3 12/16] x86/alternative: Try inline spectre_v2=retpoline,amd Peter Zijlstra
@ 2021-10-29  8:02   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-10-29  8:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Borislav Petkov, Josh Poimboeuf, Alexei Starovoitov, x86,
	linux-kernel

The following commit has been merged into the objtool/core branch of tip:

Commit-ID:     bbe2df3f6b6da7848398d55b1311d58a16ec21e4
Gitweb:        https://git.kernel.org/tip/bbe2df3f6b6da7848398d55b1311d58a16ec21e4
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 26 Oct 2021 14:01:44 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 28 Oct 2021 23:25:28 +02:00

x86/alternative: Try inline spectre_v2=retpoline,amd

Try and replace retpoline thunk calls with:

  LFENCE
  CALL    *%\reg

for spectre_v2=retpoline,amd.

Specifically, the sequence above is 5 bytes for the low 8 registers,
but 6 bytes for the high 8 registers. This means that unless the
compilers prefix stuff the call with higher registers this replacement
will fail.

Luckily GCC strongly favours RAX for the indirect calls and most (95%+
for defconfig-x86_64) will be converted. OTOH clang strongly favours
R11 and almost nothing gets converted.

Note: it will also generate a correct replacement for the Jcc.d32
case, except unless the compilers start to prefix stuff that, it'll
never fit. Specifically:

  Jncc.d8 1f
  LFENCE
  JMP     *%\reg
1:

is 7-8 bytes long, where the original instruction in unpadded form is
only 6 bytes.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.359986601@infradead.org
---
 arch/x86/kernel/alternative.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 1dea2f6..c89824c 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -389,6 +389,7 @@ static int emit_indirect(int op, int reg, u8 *bytes)
  *
  *   CALL *%\reg
  *
+ * It also tries to inline spectre_v2=retpoline,amd when size permits.
  */
 static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
 {
@@ -405,7 +406,8 @@ static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
 	/* If anyone ever does: CALL/JMP *%rsp, we're in deep trouble. */
 	BUG_ON(reg == 4);
 
-	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
+	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE) &&
+	    !cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD))
 		return -1;
 
 	op = insn->opcode.bytes[0];
@@ -418,8 +420,9 @@ static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
 	 * into:
 	 *
 	 *   Jncc.d8 1f
+	 *   [ LFENCE ]
 	 *   JMP *%\reg
-	 *   NOP
+	 *   [ NOP ]
 	 * 1:
 	 */
 	/* Jcc.d32 second opcode byte is in the range: 0x80-0x8f */
@@ -434,6 +437,15 @@ static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
 		op = JMP32_INSN_OPCODE;
 	}
 
+	/*
+	 * For RETPOLINE_AMD: prepend the indirect CALL/JMP with an LFENCE.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_AMD)) {
+		bytes[i++] = 0x0f;
+		bytes[i++] = 0xae;
+		bytes[i++] = 0xe8; /* LFENCE */
+	}
+
 	ret = emit_indirect(op, reg, bytes + i);
 	if (ret < 0)
 		return ret;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip: objtool/core] x86/alternative: Implement .retpoline_sites support
  2021-10-26 12:01 ` [PATCH v3 10/16] x86/alternative: Implement .retpoline_sites support Peter Zijlstra
  2021-10-27 17:38   ` Borislav Petkov
@ 2021-10-29  8:02   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 48+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-10-29  8:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Borislav Petkov, Josh Poimboeuf, Alexei Starovoitov, x86,
	linux-kernel

The following commit has been merged into the objtool/core branch of tip:

Commit-ID:     7508500900814d14e2e085cdc4e28142721abbdf
Gitweb:        https://git.kernel.org/tip/7508500900814d14e2e085cdc4e28142721abbdf
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 26 Oct 2021 14:01:42 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 28 Oct 2021 23:25:27 +02:00

x86/alternative: Implement .retpoline_sites support

Rewrite retpoline thunk call sites to be indirect calls for
spectre_v2=off. This ensures spectre_v2=off is as near to a
RETPOLINE=n build as possible.

This is the replacement for objtool writing alternative entries to
ensure the same and achieves feature-parity with the previous
approach.

One noteworthy feature is that it relies on the thunks to be in
machine order to compute the register index.

Specifically, this does not yet address the Jcc __x86_indirect_thunk_*
calls generated by clang, a future patch will add this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.232495794@infradead.org
---
 arch/um/kernel/um_arch.c           |   4 +-
 arch/x86/include/asm/alternative.h |   1 +-
 arch/x86/kernel/alternative.c      | 141 +++++++++++++++++++++++++++-
 arch/x86/kernel/module.c           |   9 +-
 4 files changed, 150 insertions(+), 5 deletions(-)

diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index a149a5e..5444769 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -421,6 +421,10 @@ void __init check_bugs(void)
 	os_check_bugs();
 }
 
+void apply_retpolines(s32 *start, s32 *end)
+{
+}
+
 void apply_alternatives(struct alt_instr *start, struct alt_instr *end)
 {
 }
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index a3c2315..58eee64 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -75,6 +75,7 @@ extern int alternatives_patched;
 
 extern void alternative_instructions(void);
 extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end);
+extern void apply_retpolines(s32 *start, s32 *end);
 
 struct module;
 
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index e9da3dc..5df4034 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -29,6 +29,7 @@
 #include <asm/io.h>
 #include <asm/fixmap.h>
 #include <asm/paravirt.h>
+#include <asm/asm-prototypes.h>
 
 int __read_mostly alternatives_patched;
 
@@ -113,6 +114,7 @@ static void __init_or_module add_nops(void *insns, unsigned int len)
 	}
 }
 
+extern s32 __retpoline_sites[], __retpoline_sites_end[];
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
 extern s32 __smp_locks[], __smp_locks_end[];
 void text_poke_early(void *addr, const void *opcode, size_t len);
@@ -221,7 +223,7 @@ static __always_inline int optimize_nops_range(u8 *instr, u8 instrlen, int off)
  * "noinline" to cause control flow change and thus invalidate I$ and
  * cause refetch after modification.
  */
-static void __init_or_module noinline optimize_nops(struct alt_instr *a, u8 *instr)
+static void __init_or_module noinline optimize_nops(u8 *instr, size_t len)
 {
 	struct insn insn;
 	int i = 0;
@@ -239,11 +241,11 @@ static void __init_or_module noinline optimize_nops(struct alt_instr *a, u8 *ins
 		 * optimized.
 		 */
 		if (insn.length == 1 && insn.opcode.bytes[0] == 0x90)
-			i += optimize_nops_range(instr, a->instrlen, i);
+			i += optimize_nops_range(instr, len, i);
 		else
 			i += insn.length;
 
-		if (i >= a->instrlen)
+		if (i >= len)
 			return;
 	}
 }
@@ -331,10 +333,135 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
 		text_poke_early(instr, insn_buff, insn_buff_sz);
 
 next:
-		optimize_nops(a, instr);
+		optimize_nops(instr, a->instrlen);
 	}
 }
 
+#if defined(CONFIG_RETPOLINE) && defined(CONFIG_STACK_VALIDATION)
+
+/*
+ * CALL/JMP *%\reg
+ */
+static int emit_indirect(int op, int reg, u8 *bytes)
+{
+	int i = 0;
+	u8 modrm;
+
+	switch (op) {
+	case CALL_INSN_OPCODE:
+		modrm = 0x10; /* Reg = 2; CALL r/m */
+		break;
+
+	case JMP32_INSN_OPCODE:
+		modrm = 0x20; /* Reg = 4; JMP r/m */
+		break;
+
+	default:
+		WARN_ON_ONCE(1);
+		return -1;
+	}
+
+	if (reg >= 8) {
+		bytes[i++] = 0x41; /* REX.B prefix */
+		reg -= 8;
+	}
+
+	modrm |= 0xc0; /* Mod = 3 */
+	modrm += reg;
+
+	bytes[i++] = 0xff; /* opcode */
+	bytes[i++] = modrm;
+
+	return i;
+}
+
+/*
+ * Rewrite the compiler generated retpoline thunk calls.
+ *
+ * For spectre_v2=off (!X86_FEATURE_RETPOLINE), rewrite them into immediate
+ * indirect instructions, avoiding the extra indirection.
+ *
+ * For example, convert:
+ *
+ *   CALL __x86_indirect_thunk_\reg
+ *
+ * into:
+ *
+ *   CALL *%\reg
+ *
+ */
+static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
+{
+	retpoline_thunk_t *target;
+	int reg, i = 0;
+
+	target = addr + insn->length + insn->immediate.value;
+	reg = target - __x86_indirect_thunk_array;
+
+	if (WARN_ON_ONCE(reg & ~0xf))
+		return -1;
+
+	/* If anyone ever does: CALL/JMP *%rsp, we're in deep trouble. */
+	BUG_ON(reg == 4);
+
+	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
+		return -1;
+
+	i = emit_indirect(insn->opcode.bytes[0], reg, bytes);
+	if (i < 0)
+		return i;
+
+	for (; i < insn->length;)
+		bytes[i++] = BYTES_NOP1;
+
+	return i;
+}
+
+/*
+ * Generated by 'objtool --retpoline'.
+ */
+void __init_or_module noinline apply_retpolines(s32 *start, s32 *end)
+{
+	s32 *s;
+
+	for (s = start; s < end; s++) {
+		void *addr = (void *)s + *s;
+		struct insn insn;
+		int len, ret;
+		u8 bytes[16];
+		u8 op1, op2;
+
+		ret = insn_decode_kernel(&insn, addr);
+		if (WARN_ON_ONCE(ret < 0))
+			continue;
+
+		op1 = insn.opcode.bytes[0];
+		op2 = insn.opcode.bytes[1];
+
+		switch (op1) {
+		case CALL_INSN_OPCODE:
+		case JMP32_INSN_OPCODE:
+			break;
+
+		default:
+			WARN_ON_ONCE(1);
+			continue;
+		}
+
+		len = patch_retpoline(addr, &insn, bytes);
+		if (len == insn.length) {
+			optimize_nops(bytes, len);
+			text_poke_early(addr, bytes, len);
+		}
+	}
+}
+
+#else /* !RETPOLINES || !CONFIG_STACK_VALIDATION */
+
+void __init_or_module noinline apply_retpolines(s32 *start, s32 *end) { }
+
+#endif /* CONFIG_RETPOLINE && CONFIG_STACK_VALIDATION */
+
 #ifdef CONFIG_SMP
 static void alternatives_smp_lock(const s32 *start, const s32 *end,
 				  u8 *text, u8 *text_end)
@@ -643,6 +770,12 @@ void __init alternative_instructions(void)
 	apply_paravirt(__parainstructions, __parainstructions_end);
 
 	/*
+	 * Rewrite the retpolines, must be done before alternatives since
+	 * those can rewrite the retpoline thunks.
+	 */
+	apply_retpolines(__retpoline_sites, __retpoline_sites_end);
+
+	/*
 	 * Then patch alternatives, such that those paravirt calls that are in
 	 * alternatives can be overwritten by their immediate fragments.
 	 */
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 5e9a34b..169fb6f 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -251,7 +251,8 @@ int module_finalize(const Elf_Ehdr *hdr,
 		    struct module *me)
 {
 	const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
-		*para = NULL, *orc = NULL, *orc_ip = NULL;
+		*para = NULL, *orc = NULL, *orc_ip = NULL,
+		*retpolines = NULL;
 	char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
 
 	for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
@@ -267,8 +268,14 @@ int module_finalize(const Elf_Ehdr *hdr,
 			orc = s;
 		if (!strcmp(".orc_unwind_ip", secstrings + s->sh_name))
 			orc_ip = s;
+		if (!strcmp(".retpoline_sites", secstrings + s->sh_name))
+			retpolines = s;
 	}
 
+	if (retpolines) {
+		void *rseg = (void *)retpolines->sh_addr;
+		apply_retpolines(rseg, rseg + retpolines->sh_size);
+	}
 	if (alt) {
 		/* patch .altinstructions */
 		void *aseg = (void *)alt->sh_addr;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip: objtool/core] x86/alternative: Handle Jcc __x86_indirect_thunk_\reg
  2021-10-26 12:01 ` [PATCH v3 11/16] x86/alternative: Handle Jcc __x86_indirect_thunk_\reg Peter Zijlstra
  2021-10-28  9:29   ` Borislav Petkov
@ 2021-10-29  8:02   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 48+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-10-29  8:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Borislav Petkov, Josh Poimboeuf, Alexei Starovoitov, x86,
	linux-kernel

The following commit has been merged into the objtool/core branch of tip:

Commit-ID:     2f0cbb2a8e5bbf101e9de118fc0eb168111a5e1e
Gitweb:        https://git.kernel.org/tip/2f0cbb2a8e5bbf101e9de118fc0eb168111a5e1e
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 26 Oct 2021 14:01:43 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 28 Oct 2021 23:25:28 +02:00

x86/alternative: Handle Jcc __x86_indirect_thunk_\reg

Handle the rare cases where the compiler (clang) does an indirect
conditional tail-call using:

  Jcc __x86_indirect_thunk_\reg

For the !RETPOLINE case this can be rewritten to fit the original (6
byte) instruction like:

  Jncc.d8	1f
  JMP		*%\reg
  NOP
1:

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.296470217@infradead.org
---
 arch/x86/kernel/alternative.c | 40 ++++++++++++++++++++++++++++++----
 1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 5df4034..1dea2f6 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -393,7 +393,8 @@ static int emit_indirect(int op, int reg, u8 *bytes)
 static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
 {
 	retpoline_thunk_t *target;
-	int reg, i = 0;
+	int reg, ret, i = 0;
+	u8 op, cc;
 
 	target = addr + insn->length + insn->immediate.value;
 	reg = target - __x86_indirect_thunk_array;
@@ -407,9 +408,36 @@ static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
 	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
 		return -1;
 
-	i = emit_indirect(insn->opcode.bytes[0], reg, bytes);
-	if (i < 0)
-		return i;
+	op = insn->opcode.bytes[0];
+
+	/*
+	 * Convert:
+	 *
+	 *   Jcc.d32 __x86_indirect_thunk_\reg
+	 *
+	 * into:
+	 *
+	 *   Jncc.d8 1f
+	 *   JMP *%\reg
+	 *   NOP
+	 * 1:
+	 */
+	/* Jcc.d32 second opcode byte is in the range: 0x80-0x8f */
+	if (op == 0x0f && (insn->opcode.bytes[1] & 0xf0) == 0x80) {
+		cc = insn->opcode.bytes[1] & 0xf;
+		cc ^= 1; /* invert condition */
+
+		bytes[i++] = 0x70 + cc;        /* Jcc.d8 */
+		bytes[i++] = insn->length - 2; /* sizeof(Jcc.d8) == 2 */
+
+		/* Continue as if: JMP.d32 __x86_indirect_thunk_\reg */
+		op = JMP32_INSN_OPCODE;
+	}
+
+	ret = emit_indirect(op, reg, bytes + i);
+	if (ret < 0)
+		return ret;
+	i += ret;
 
 	for (; i < insn->length;)
 		bytes[i++] = BYTES_NOP1;
@@ -443,6 +471,10 @@ void __init_or_module noinline apply_retpolines(s32 *start, s32 *end)
 		case JMP32_INSN_OPCODE:
 			break;
 
+		case 0x0f: /* escape */
+			if (op2 >= 0x80 && op2 <= 0x8f)
+				break;
+			fallthrough;
 		default:
 			WARN_ON_ONCE(1);
 			continue;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip: objtool/core] x86/retpoline: Create a retpoline thunk array
  2021-10-26 12:01 ` [PATCH v3 09/16] x86/retpoline: Create a retpoline thunk array Peter Zijlstra
@ 2021-10-29  8:03   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-10-29  8:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Borislav Petkov, Josh Poimboeuf, Alexei Starovoitov, x86,
	linux-kernel

The following commit has been merged into the objtool/core branch of tip:

Commit-ID:     1a6f74429c42a3854980359a758e222005712aee
Gitweb:        https://git.kernel.org/tip/1a6f74429c42a3854980359a758e222005712aee
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 26 Oct 2021 14:01:41 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 28 Oct 2021 23:25:27 +02:00

x86/retpoline: Create a retpoline thunk array

Stick all the retpolines in a single symbol and have the individual
thunks as inner labels, this should guarantee thunk order and layout.

Previously there were 16 (or rather 15 without rsp) separate symbols and
a toolchain might reasonably expect it could displace them however it
liked, with disregard for their relative position.

However, now they're part of a larger symbol. Any change to their
relative position would disrupt this larger _array symbol and thus not
be sound.

This is the same reasoning used for data symbols. On their own there
is no guarantee about their relative position wrt to one aonther, but
we're still able to do arrays because an array as a whole is a single
larger symbol.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.169659320@infradead.org
---
 arch/x86/include/asm/nospec-branch.h |  8 +++++++-
 arch/x86/lib/retpoline.S             | 14 +++++++++-----
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 14053cd..e22aedb 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -12,6 +12,8 @@
 #include <asm/msr-index.h>
 #include <asm/unwind_hints.h>
 
+#define RETPOLINE_THUNK_SIZE	32
+
 /*
  * Fill the CPU return stack buffer.
  *
@@ -120,11 +122,15 @@
 
 #ifdef CONFIG_RETPOLINE
 
+typedef u8 retpoline_thunk_t[RETPOLINE_THUNK_SIZE];
+
 #define GEN(reg) \
-	extern asmlinkage void __x86_indirect_thunk_ ## reg (void);
+	extern retpoline_thunk_t __x86_indirect_thunk_ ## reg;
 #include <asm/GEN-for-each-reg.h>
 #undef GEN
 
+extern retpoline_thunk_t __x86_indirect_thunk_array[];
+
 #ifdef CONFIG_X86_64
 
 /*
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 4c910fa..cf0b39f 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -28,16 +28,14 @@
 
 .macro THUNK reg
 
-	.align 32
-
-SYM_FUNC_START(__x86_indirect_thunk_\reg)
+	.align RETPOLINE_THUNK_SIZE
+SYM_INNER_LABEL(__x86_indirect_thunk_\reg, SYM_L_GLOBAL)
+	UNWIND_HINT_EMPTY
 
 	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), \
 		      __stringify(RETPOLINE \reg), X86_FEATURE_RETPOLINE, \
 		      __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), X86_FEATURE_RETPOLINE_AMD
 
-SYM_FUNC_END(__x86_indirect_thunk_\reg)
-
 .endm
 
 /*
@@ -55,10 +53,16 @@ SYM_FUNC_END(__x86_indirect_thunk_\reg)
 #define __EXPORT_THUNK(sym)	_ASM_NOKPROBE(sym); EXPORT_SYMBOL(sym)
 #define EXPORT_THUNK(reg)	__EXPORT_THUNK(__x86_indirect_thunk_ ## reg)
 
+	.align RETPOLINE_THUNK_SIZE
+SYM_CODE_START(__x86_indirect_thunk_array)
+
 #define GEN(reg) THUNK reg
 #include <asm/GEN-for-each-reg.h>
 #undef GEN
 
+	.align RETPOLINE_THUNK_SIZE
+SYM_CODE_END(__x86_indirect_thunk_array)
+
 #define GEN(reg) EXPORT_THUNK(reg)
 #include <asm/GEN-for-each-reg.h>
 #undef GEN

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip: objtool/core] x86/retpoline: Move the retpoline thunk declarations to nospec-branch.h
  2021-10-26 12:01 ` [PATCH v3 08/16] x86/retpoline: Move the retpoline thunk declarations to nospec-branch.h Peter Zijlstra
@ 2021-10-29  8:03   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-10-29  8:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Borislav Petkov, Josh Poimboeuf, Alexei Starovoitov, x86,
	linux-kernel

The following commit has been merged into the objtool/core branch of tip:

Commit-ID:     6fda8a38865607db739be3e567a2387376222dbd
Gitweb:        https://git.kernel.org/tip/6fda8a38865607db739be3e567a2387376222dbd
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 26 Oct 2021 14:01:40 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 28 Oct 2021 23:25:27 +02:00

x86/retpoline: Move the retpoline thunk declarations to nospec-branch.h

Because it makes no sense to split the retpoline gunk over multiple
headers.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.106290934@infradead.org
---
 arch/x86/include/asm/asm-prototypes.h | 8 --------
 arch/x86/include/asm/nospec-branch.h  | 7 +++++++
 arch/x86/net/bpf_jit_comp.c           | 1 -
 3 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index a2bed09..8f80de6 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -17,11 +17,3 @@
 extern void cmpxchg8b_emu(void);
 #endif
 
-#ifdef CONFIG_RETPOLINE
-
-#define GEN(reg) \
-	extern asmlinkage void __x86_indirect_thunk_ ## reg (void);
-#include <asm/GEN-for-each-reg.h>
-#undef GEN
-
-#endif /* CONFIG_RETPOLINE */
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index ec2d5c8..14053cd 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -5,6 +5,7 @@
 
 #include <linux/static_key.h>
 #include <linux/objtool.h>
+#include <linux/linkage.h>
 
 #include <asm/alternative.h>
 #include <asm/cpufeatures.h>
@@ -118,6 +119,12 @@
 	".popsection\n\t"
 
 #ifdef CONFIG_RETPOLINE
+
+#define GEN(reg) \
+	extern asmlinkage void __x86_indirect_thunk_ ## reg (void);
+#include <asm/GEN-for-each-reg.h>
+#undef GEN
+
 #ifdef CONFIG_X86_64
 
 /*
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 9ea5738..7c03af6 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -15,7 +15,6 @@
 #include <asm/set_memory.h>
 #include <asm/nospec-branch.h>
 #include <asm/text-patching.h>
-#include <asm/asm-prototypes.h>
 
 static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len)
 {

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip: objtool/core] x86/asm: Fixup odd GEN-for-each-reg.h usage
  2021-10-26 12:01 ` [PATCH v3 07/16] x86/asm: Fixup odd GEN-for-each-reg.h usage Peter Zijlstra
@ 2021-10-29  8:03   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-10-29  8:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Borislav Petkov, Josh Poimboeuf, Alexei Starovoitov, x86,
	linux-kernel

The following commit has been merged into the objtool/core branch of tip:

Commit-ID:     b6d3d9944bd7c9e8c06994ead3c9952f673f2a66
Gitweb:        https://git.kernel.org/tip/b6d3d9944bd7c9e8c06994ead3c9952f673f2a66
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 26 Oct 2021 14:01:39 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 28 Oct 2021 23:25:26 +02:00

x86/asm: Fixup odd GEN-for-each-reg.h usage

Currently GEN-for-each-reg.h usage leaves GEN defined, relying on any
subsequent usage to start with #undef, which is rude.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.041792350@infradead.org
---
 arch/x86/include/asm/asm-prototypes.h | 2 +-
 arch/x86/lib/retpoline.S              | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index a28c5ca..a2bed09 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -19,9 +19,9 @@ extern void cmpxchg8b_emu(void);
 
 #ifdef CONFIG_RETPOLINE
 
-#undef GEN
 #define GEN(reg) \
 	extern asmlinkage void __x86_indirect_thunk_ ## reg (void);
 #include <asm/GEN-for-each-reg.h>
+#undef GEN
 
 #endif /* CONFIG_RETPOLINE */
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index a91e0dc..4c910fa 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -55,10 +55,10 @@ SYM_FUNC_END(__x86_indirect_thunk_\reg)
 #define __EXPORT_THUNK(sym)	_ASM_NOKPROBE(sym); EXPORT_SYMBOL(sym)
 #define EXPORT_THUNK(reg)	__EXPORT_THUNK(__x86_indirect_thunk_ ## reg)
 
-#undef GEN
 #define GEN(reg) THUNK reg
 #include <asm/GEN-for-each-reg.h>
-
 #undef GEN
+
 #define GEN(reg) EXPORT_THUNK(reg)
 #include <asm/GEN-for-each-reg.h>
+#undef GEN

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip: objtool/core] x86/asm: Fix register order
  2021-10-26 12:01 ` [PATCH v3 06/16] x86/asm: Fix register order Peter Zijlstra
@ 2021-10-29  8:03   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-10-29  8:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Borislav Petkov, Josh Poimboeuf, Alexei Starovoitov, x86,
	linux-kernel

The following commit has been merged into the objtool/core branch of tip:

Commit-ID:     a92ede2d584a2e070def59c7e47e6b6f6341c55c
Gitweb:        https://git.kernel.org/tip/a92ede2d584a2e070def59c7e47e6b6f6341c55c
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 26 Oct 2021 14:01:38 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 28 Oct 2021 23:25:26 +02:00

x86/asm: Fix register order

Ensure the register order is correct; this allows for easy translation
between register number and trampoline and vice-versa.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120309.978573921@infradead.org
---
 arch/x86/include/asm/GEN-for-each-reg.h | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/GEN-for-each-reg.h b/arch/x86/include/asm/GEN-for-each-reg.h
index 1b07fb1..0794910 100644
--- a/arch/x86/include/asm/GEN-for-each-reg.h
+++ b/arch/x86/include/asm/GEN-for-each-reg.h
@@ -1,11 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * These are in machine order; things rely on that.
+ */
 #ifdef CONFIG_64BIT
 GEN(rax)
-GEN(rbx)
 GEN(rcx)
 GEN(rdx)
+GEN(rbx)
+GEN(rsp)
+GEN(rbp)
 GEN(rsi)
 GEN(rdi)
-GEN(rbp)
 GEN(r8)
 GEN(r9)
 GEN(r10)
@@ -16,10 +21,11 @@ GEN(r14)
 GEN(r15)
 #else
 GEN(eax)
-GEN(ebx)
 GEN(ecx)
 GEN(edx)
+GEN(ebx)
+GEN(esp)
+GEN(ebp)
 GEN(esi)
 GEN(edi)
-GEN(ebp)
 #endif

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip: objtool/core] x86/retpoline: Remove unused replacement symbols
  2021-10-26 12:01 ` [PATCH v3 05/16] x86/retpoline: Remove unused replacement symbols Peter Zijlstra
@ 2021-10-29  8:03   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-10-29  8:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Borislav Petkov, Josh Poimboeuf, Alexei Starovoitov, x86,
	linux-kernel

The following commit has been merged into the objtool/core branch of tip:

Commit-ID:     4fe79e710d9574a14993f8b4e16b7252da72d5e8
Gitweb:        https://git.kernel.org/tip/4fe79e710d9574a14993f8b4e16b7252da72d5e8
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 26 Oct 2021 14:01:37 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 28 Oct 2021 23:25:26 +02:00

x86/retpoline: Remove unused replacement symbols

Now that objtool no longer creates alternatives, these replacement
symbols are no longer needed, remove them.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120309.915051744@infradead.org
---
 arch/x86/include/asm/asm-prototypes.h | 10 +------
 arch/x86/lib/retpoline.S              | 42 +--------------------------
 2 files changed, 52 deletions(-)

diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index 4cb726c..a28c5ca 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -24,14 +24,4 @@ extern void cmpxchg8b_emu(void);
 	extern asmlinkage void __x86_indirect_thunk_ ## reg (void);
 #include <asm/GEN-for-each-reg.h>
 
-#undef GEN
-#define GEN(reg) \
-	extern asmlinkage void __x86_indirect_alt_call_ ## reg (void);
-#include <asm/GEN-for-each-reg.h>
-
-#undef GEN
-#define GEN(reg) \
-	extern asmlinkage void __x86_indirect_alt_jmp_ ## reg (void);
-#include <asm/GEN-for-each-reg.h>
-
 #endif /* CONFIG_RETPOLINE */
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index ec9922c..a91e0dc 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -41,36 +41,6 @@ SYM_FUNC_END(__x86_indirect_thunk_\reg)
 .endm
 
 /*
- * This generates .altinstr_replacement symbols for use by objtool. They,
- * however, must not actually live in .altinstr_replacement since that will be
- * discarded after init, but module alternatives will also reference these
- * symbols.
- *
- * Their names matches the "__x86_indirect_" prefix to mark them as retpolines.
- */
-.macro ALT_THUNK reg
-
-	.align 1
-
-SYM_FUNC_START_NOALIGN(__x86_indirect_alt_call_\reg)
-	ANNOTATE_RETPOLINE_SAFE
-1:	call	*%\reg
-2:	.skip	5-(2b-1b), 0x90
-SYM_FUNC_END(__x86_indirect_alt_call_\reg)
-
-STACK_FRAME_NON_STANDARD(__x86_indirect_alt_call_\reg)
-
-SYM_FUNC_START_NOALIGN(__x86_indirect_alt_jmp_\reg)
-	ANNOTATE_RETPOLINE_SAFE
-1:	jmp	*%\reg
-2:	.skip	5-(2b-1b), 0x90
-SYM_FUNC_END(__x86_indirect_alt_jmp_\reg)
-
-STACK_FRAME_NON_STANDARD(__x86_indirect_alt_jmp_\reg)
-
-.endm
-
-/*
  * Despite being an assembler file we can't just use .irp here
  * because __KSYM_DEPS__ only uses the C preprocessor and would
  * only see one instance of "__x86_indirect_thunk_\reg" rather
@@ -92,15 +62,3 @@ STACK_FRAME_NON_STANDARD(__x86_indirect_alt_jmp_\reg)
 #undef GEN
 #define GEN(reg) EXPORT_THUNK(reg)
 #include <asm/GEN-for-each-reg.h>
-
-#undef GEN
-#define GEN(reg) ALT_THUNK reg
-#include <asm/GEN-for-each-reg.h>
-
-#undef GEN
-#define GEN(reg) __EXPORT_THUNK(__x86_indirect_alt_call_ ## reg)
-#include <asm/GEN-for-each-reg.h>
-
-#undef GEN
-#define GEN(reg) __EXPORT_THUNK(__x86_indirect_alt_jmp_ ## reg)
-#include <asm/GEN-for-each-reg.h>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip: objtool/core] objtool,x86: Replace alternatives with .retpoline_sites
  2021-10-26 12:01 ` [PATCH v3 04/16] objtool,x86: Replace alternatives with .retpoline_sites Peter Zijlstra
@ 2021-10-29  8:03   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-10-29  8:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Borislav Petkov, Josh Poimboeuf, Alexei Starovoitov, x86,
	linux-kernel

The following commit has been merged into the objtool/core branch of tip:

Commit-ID:     134ab5bd1883312d7a4b3033b05c6b5a1bb8889b
Gitweb:        https://git.kernel.org/tip/134ab5bd1883312d7a4b3033b05c6b5a1bb8889b
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 26 Oct 2021 14:01:36 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 28 Oct 2021 23:25:25 +02:00

objtool,x86: Replace alternatives with .retpoline_sites

Instead of writing complete alternatives, simply provide a list of all
the retpoline thunk calls. Then the kernel is free to do with them as
it pleases. Simpler code all-round.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120309.850007165@infradead.org
---
 arch/x86/kernel/vmlinux.lds.S       |  14 +++-
 tools/objtool/arch/x86/decode.c     | 120 +-------------------------
 tools/objtool/check.c               | 132 +++++++++++++++++++--------
 tools/objtool/elf.c                 |  84 +-----------------
 tools/objtool/include/objtool/elf.h |   1 +-
 tools/objtool/special.c             |   8 +--
 6 files changed, 107 insertions(+), 252 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index efd9e9e..3d6dc12 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -272,6 +272,20 @@ SECTIONS
 		__parainstructions_end = .;
 	}
 
+#ifdef CONFIG_RETPOLINE
+	/*
+	 * List of instructions that call/jmp/jcc to retpoline thunks
+	 * __x86_indirect_thunk_*(). These instructions can be patched along
+	 * with alternatives, after which the section can be freed.
+	 */
+	. = ALIGN(8);
+	.retpoline_sites : AT(ADDR(.retpoline_sites) - LOAD_OFFSET) {
+		__retpoline_sites = .;
+		*(.retpoline_sites)
+		__retpoline_sites_end = .;
+	}
+#endif
+
 	/*
 	 * struct alt_inst entries. From the header (alternative.h):
 	 * "Alternative instructions for different CPU types or capabilities"
diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 1f2ae70..4d6d7fc 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -711,126 +711,6 @@ const char *arch_ret_insn(int len)
 	return ret[len-1];
 }
 
-/* asm/alternative.h ? */
-
-#define ALTINSTR_FLAG_INV	(1 << 15)
-#define ALT_NOT(feat)		((feat) | ALTINSTR_FLAG_INV)
-
-struct alt_instr {
-	s32 instr_offset;	/* original instruction */
-	s32 repl_offset;	/* offset to replacement instruction */
-	u16 cpuid;		/* cpuid bit set for replacement */
-	u8  instrlen;		/* length of original instruction */
-	u8  replacementlen;	/* length of new instruction */
-} __packed;
-
-static int elf_add_alternative(struct elf *elf,
-			       struct instruction *orig, struct symbol *sym,
-			       int cpuid, u8 orig_len, u8 repl_len)
-{
-	const int size = sizeof(struct alt_instr);
-	struct alt_instr *alt;
-	struct section *sec;
-	Elf_Scn *s;
-
-	sec = find_section_by_name(elf, ".altinstructions");
-	if (!sec) {
-		sec = elf_create_section(elf, ".altinstructions",
-					 SHF_ALLOC, 0, 0);
-
-		if (!sec) {
-			WARN_ELF("elf_create_section");
-			return -1;
-		}
-	}
-
-	s = elf_getscn(elf->elf, sec->idx);
-	if (!s) {
-		WARN_ELF("elf_getscn");
-		return -1;
-	}
-
-	sec->data = elf_newdata(s);
-	if (!sec->data) {
-		WARN_ELF("elf_newdata");
-		return -1;
-	}
-
-	sec->data->d_size = size;
-	sec->data->d_align = 1;
-
-	alt = sec->data->d_buf = malloc(size);
-	if (!sec->data->d_buf) {
-		perror("malloc");
-		return -1;
-	}
-	memset(sec->data->d_buf, 0, size);
-
-	if (elf_add_reloc_to_insn(elf, sec, sec->sh.sh_size,
-				  R_X86_64_PC32, orig->sec, orig->offset)) {
-		WARN("elf_create_reloc: alt_instr::instr_offset");
-		return -1;
-	}
-
-	if (elf_add_reloc(elf, sec, sec->sh.sh_size + 4,
-			  R_X86_64_PC32, sym, 0)) {
-		WARN("elf_create_reloc: alt_instr::repl_offset");
-		return -1;
-	}
-
-	alt->cpuid = bswap_if_needed(cpuid);
-	alt->instrlen = orig_len;
-	alt->replacementlen = repl_len;
-
-	sec->sh.sh_size += size;
-	sec->changed = true;
-
-	return 0;
-}
-
-#define X86_FEATURE_RETPOLINE                ( 7*32+12)
-
-int arch_rewrite_retpolines(struct objtool_file *file)
-{
-	struct instruction *insn;
-	struct reloc *reloc;
-	struct symbol *sym;
-	char name[32] = "";
-
-	list_for_each_entry(insn, &file->retpoline_call_list, call_node) {
-
-		if (insn->type != INSN_JUMP_DYNAMIC &&
-		    insn->type != INSN_CALL_DYNAMIC)
-			continue;
-
-		if (!strcmp(insn->sec->name, ".text.__x86.indirect_thunk"))
-			continue;
-
-		reloc = insn->reloc;
-
-		sprintf(name, "__x86_indirect_alt_%s_%s",
-			insn->type == INSN_JUMP_DYNAMIC ? "jmp" : "call",
-			reloc->sym->name + 21);
-
-		sym = find_symbol_by_name(file->elf, name);
-		if (!sym) {
-			sym = elf_create_undef_symbol(file->elf, name);
-			if (!sym) {
-				WARN("elf_create_undef_symbol");
-				return -1;
-			}
-		}
-
-		if (elf_add_alternative(file->elf, insn, sym,
-					ALT_NOT(X86_FEATURE_RETPOLINE), 5, 5)) {
-			WARN("elf_add_alternative");
-			return -1;
-		}
-	}
-
-	return 0;
-}
-
 int arch_decode_hint_reg(u8 sp_reg, int *base)
 {
 	switch (sp_reg) {
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index ce3c25f..fb3f251 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -683,6 +683,52 @@ static int create_static_call_sections(struct objtool_file *file)
 	return 0;
 }
 
+static int create_retpoline_sites_sections(struct objtool_file *file)
+{
+	struct instruction *insn;
+	struct section *sec;
+	int idx;
+
+	sec = find_section_by_name(file->elf, ".retpoline_sites");
+	if (sec) {
+		WARN("file already has .retpoline_sites, skipping");
+		return 0;
+	}
+
+	idx = 0;
+	list_for_each_entry(insn, &file->retpoline_call_list, call_node)
+		idx++;
+
+	if (!idx)
+		return 0;
+
+	sec = elf_create_section(file->elf, ".retpoline_sites", 0,
+				 sizeof(int), idx);
+	if (!sec) {
+		WARN("elf_create_section: .retpoline_sites");
+		return -1;
+	}
+
+	idx = 0;
+	list_for_each_entry(insn, &file->retpoline_call_list, call_node) {
+
+		int *site = (int *)sec->data->d_buf + idx;
+		*site = 0;
+
+		if (elf_add_reloc_to_insn(file->elf, sec,
+					  idx * sizeof(int),
+					  R_X86_64_PC32,
+					  insn->sec, insn->offset)) {
+			WARN("elf_add_reloc_to_insn: .retpoline_sites");
+			return -1;
+		}
+
+		idx++;
+	}
+
+	return 0;
+}
+
 static int create_mcount_loc_sections(struct objtool_file *file)
 {
 	struct section *sec;
@@ -1016,6 +1062,11 @@ static void annotate_call_site(struct objtool_file *file,
 		return;
 	}
 
+	if (sym->retpoline_thunk) {
+		list_add_tail(&insn->call_node, &file->retpoline_call_list);
+		return;
+	}
+
 	/*
 	 * Many compilers cannot disable KCOV with a function attribute
 	 * so they need a little help, NOP out any KCOV calls from noinstr
@@ -1075,6 +1126,39 @@ static void add_call_dest(struct objtool_file *file, struct instruction *insn,
 	annotate_call_site(file, insn, sibling);
 }
 
+static void add_retpoline_call(struct objtool_file *file, struct instruction *insn)
+{
+	/*
+	 * Retpoline calls/jumps are really dynamic calls/jumps in disguise,
+	 * so convert them accordingly.
+	 */
+	switch (insn->type) {
+	case INSN_CALL:
+		insn->type = INSN_CALL_DYNAMIC;
+		break;
+	case INSN_JUMP_UNCONDITIONAL:
+		insn->type = INSN_JUMP_DYNAMIC;
+		break;
+	case INSN_JUMP_CONDITIONAL:
+		insn->type = INSN_JUMP_DYNAMIC_CONDITIONAL;
+		break;
+	default:
+		return;
+	}
+
+	insn->retpoline_safe = true;
+
+	/*
+	 * Whatever stack impact regular CALLs have, should be undone
+	 * by the RETURN of the called function.
+	 *
+	 * Annotated intra-function calls retain the stack_ops but
+	 * are converted to JUMP, see read_intra_function_calls().
+	 */
+	remove_insn_ops(insn);
+
+	annotate_call_site(file, insn, false);
+}
 /*
  * Find the destination instructions for all jumps.
  */
@@ -1097,19 +1181,7 @@ static int add_jump_destinations(struct objtool_file *file)
 			dest_sec = reloc->sym->sec;
 			dest_off = arch_dest_reloc_offset(reloc->addend);
 		} else if (reloc->sym->retpoline_thunk) {
-			/*
-			 * Retpoline jumps are really dynamic jumps in
-			 * disguise, so convert them accordingly.
-			 */
-			if (insn->type == INSN_JUMP_UNCONDITIONAL)
-				insn->type = INSN_JUMP_DYNAMIC;
-			else
-				insn->type = INSN_JUMP_DYNAMIC_CONDITIONAL;
-
-			list_add_tail(&insn->call_node,
-				      &file->retpoline_call_list);
-
-			insn->retpoline_safe = true;
+			add_retpoline_call(file, insn);
 			continue;
 		} else if (insn->func) {
 			/* internal or external sibling call (with reloc) */
@@ -1238,18 +1310,7 @@ static int add_call_destinations(struct objtool_file *file)
 			add_call_dest(file, insn, dest, false);
 
 		} else if (reloc->sym->retpoline_thunk) {
-			/*
-			 * Retpoline calls are really dynamic calls in
-			 * disguise, so convert them accordingly.
-			 */
-			insn->type = INSN_CALL_DYNAMIC;
-			insn->retpoline_safe = true;
-
-			list_add_tail(&insn->call_node,
-				      &file->retpoline_call_list);
-
-			remove_insn_ops(insn);
-			continue;
+			add_retpoline_call(file, insn);
 
 		} else
 			add_call_dest(file, insn, reloc->sym, false);
@@ -1980,11 +2041,6 @@ static void mark_rodata(struct objtool_file *file)
 	file->rodata = found;
 }
 
-__weak int arch_rewrite_retpolines(struct objtool_file *file)
-{
-	return 0;
-}
-
 static int decode_sections(struct objtool_file *file)
 {
 	int ret;
@@ -2057,15 +2113,6 @@ static int decode_sections(struct objtool_file *file)
 	if (ret)
 		return ret;
 
-	/*
-	 * Must be after add_special_section_alts(), since this will emit
-	 * alternatives. Must be after add_{jump,call}_destination(), since
-	 * those create the call insn lists.
-	 */
-	ret = arch_rewrite_retpolines(file);
-	if (ret)
-		return ret;
-
 	return 0;
 }
 
@@ -3468,6 +3515,13 @@ int check(struct objtool_file *file)
 		goto out;
 	warnings += ret;
 
+	if (retpoline) {
+		ret = create_retpoline_sites_sections(file);
+		if (ret < 0)
+			goto out;
+		warnings += ret;
+	}
+
 	if (mcount) {
 		ret = create_mcount_loc_sections(file);
 		if (ret < 0)
diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index b18f005..5c02935 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -740,90 +740,6 @@ static int elf_add_string(struct elf *elf, struct section *strtab, char *str)
 	return len;
 }
 
-struct symbol *elf_create_undef_symbol(struct elf *elf, const char *name)
-{
-	struct section *symtab, *symtab_shndx;
-	struct symbol *sym;
-	Elf_Data *data;
-	Elf_Scn *s;
-
-	sym = malloc(sizeof(*sym));
-	if (!sym) {
-		perror("malloc");
-		return NULL;
-	}
-	memset(sym, 0, sizeof(*sym));
-
-	sym->name = strdup(name);
-
-	sym->sym.st_name = elf_add_string(elf, NULL, sym->name);
-	if (sym->sym.st_name == -1)
-		return NULL;
-
-	sym->sym.st_info = GELF_ST_INFO(STB_GLOBAL, STT_NOTYPE);
-	// st_other 0
-	// st_shndx 0
-	// st_value 0
-	// st_size 0
-
-	symtab = find_section_by_name(elf, ".symtab");
-	if (!symtab) {
-		WARN("can't find .symtab");
-		return NULL;
-	}
-
-	s = elf_getscn(elf->elf, symtab->idx);
-	if (!s) {
-		WARN_ELF("elf_getscn");
-		return NULL;
-	}
-
-	data = elf_newdata(s);
-	if (!data) {
-		WARN_ELF("elf_newdata");
-		return NULL;
-	}
-
-	data->d_buf = &sym->sym;
-	data->d_size = sizeof(sym->sym);
-	data->d_align = 1;
-	data->d_type = ELF_T_SYM;
-
-	sym->idx = symtab->sh.sh_size / sizeof(sym->sym);
-
-	symtab->sh.sh_size += data->d_size;
-	symtab->changed = true;
-
-	symtab_shndx = find_section_by_name(elf, ".symtab_shndx");
-	if (symtab_shndx) {
-		s = elf_getscn(elf->elf, symtab_shndx->idx);
-		if (!s) {
-			WARN_ELF("elf_getscn");
-			return NULL;
-		}
-
-		data = elf_newdata(s);
-		if (!data) {
-			WARN_ELF("elf_newdata");
-			return NULL;
-		}
-
-		data->d_buf = &sym->sym.st_size; /* conveniently 0 */
-		data->d_size = sizeof(Elf32_Word);
-		data->d_align = 4;
-		data->d_type = ELF_T_WORD;
-
-		symtab_shndx->sh.sh_size += 4;
-		symtab_shndx->changed = true;
-	}
-
-	sym->sec = find_section_by_index(elf, 0);
-
-	elf_add_symbol(elf, sym);
-
-	return sym;
-}
-
 struct section *elf_create_section(struct elf *elf, const char *name,
 				   unsigned int sh_flags, size_t entsize, int nr)
 {
diff --git a/tools/objtool/include/objtool/elf.h b/tools/objtool/include/objtool/elf.h
index c2dbf53..cdc739f 100644
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -144,7 +144,6 @@ int elf_write_insn(struct elf *elf, struct section *sec,
 		   unsigned long offset, unsigned int len,
 		   const char *insn);
 int elf_write_reloc(struct elf *elf, struct reloc *reloc);
-struct symbol *elf_create_undef_symbol(struct elf *elf, const char *name);
 int elf_write(struct elf *elf);
 void elf_close(struct elf *elf);
 
diff --git a/tools/objtool/special.c b/tools/objtool/special.c
index 06c3eac..e2223dd 100644
--- a/tools/objtool/special.c
+++ b/tools/objtool/special.c
@@ -109,14 +109,6 @@ static int get_alt_entry(struct elf *elf, struct special_entry *entry,
 			return -1;
 		}
 
-		/*
-		 * Skip retpoline .altinstr_replacement... we already rewrite the
-		 * instructions for retpolines anyway, see arch_is_retpoline()
-		 * usage in add_{call,jump}_destinations().
-		 */
-		if (arch_is_retpoline(new_reloc->sym))
-			return 1;
-
 		reloc_to_sec_off(new_reloc, &alt->new_sec, &alt->new_off);
 
 		/* _ASM_EXTABLE_EX hack */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip: objtool/core] objtool: Shrink struct instruction
  2021-10-26 12:01 ` [PATCH v3 03/16] objtool: Shrink struct instruction Peter Zijlstra
@ 2021-10-29  8:03   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-10-29  8:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Borislav Petkov, Josh Poimboeuf, Alexei Starovoitov, x86,
	linux-kernel

The following commit has been merged into the objtool/core branch of tip:

Commit-ID:     c509331b41b7365e17396c246e8c5797bccc8074
Gitweb:        https://git.kernel.org/tip/c509331b41b7365e17396c246e8c5797bccc8074
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 26 Oct 2021 14:01:35 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 28 Oct 2021 23:25:25 +02:00

objtool: Shrink struct instruction

Any one instruction can only ever call a single function, therefore
insn->mcount_loc_node is superfluous and can use insn->call_node.

This shrinks struct instruction, which is by far the most numerous
structure objtool creates.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120309.785456706@infradead.org
---
 tools/objtool/check.c                 | 6 +++---
 tools/objtool/include/objtool/check.h | 1 -
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 8ab6f24..ce3c25f 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -701,7 +701,7 @@ static int create_mcount_loc_sections(struct objtool_file *file)
 		return 0;
 
 	idx = 0;
-	list_for_each_entry(insn, &file->mcount_loc_list, mcount_loc_node)
+	list_for_each_entry(insn, &file->mcount_loc_list, call_node)
 		idx++;
 
 	sec = elf_create_section(file->elf, "__mcount_loc", 0, sizeof(unsigned long), idx);
@@ -709,7 +709,7 @@ static int create_mcount_loc_sections(struct objtool_file *file)
 		return -1;
 
 	idx = 0;
-	list_for_each_entry(insn, &file->mcount_loc_list, mcount_loc_node) {
+	list_for_each_entry(insn, &file->mcount_loc_list, call_node) {
 
 		loc = (unsigned long *)sec->data->d_buf + idx;
 		memset(loc, 0, sizeof(unsigned long));
@@ -1051,7 +1051,7 @@ static void annotate_call_site(struct objtool_file *file,
 
 		insn->type = INSN_NOP;
 
-		list_add_tail(&insn->mcount_loc_node, &file->mcount_loc_list);
+		list_add_tail(&insn->call_node, &file->mcount_loc_list);
 		return;
 	}
 }
diff --git a/tools/objtool/include/objtool/check.h b/tools/objtool/include/objtool/check.h
index 07e99c2..6cfff07 100644
--- a/tools/objtool/include/objtool/check.h
+++ b/tools/objtool/include/objtool/check.h
@@ -40,7 +40,6 @@ struct instruction {
 	struct list_head list;
 	struct hlist_node hash;
 	struct list_head call_node;
-	struct list_head mcount_loc_node;
 	struct section *sec;
 	unsigned long offset;
 	unsigned int len;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip: objtool/core] objtool: Explicitly avoid self modifying code in .altinstr_replacement
  2021-10-26 12:01 ` [PATCH v3 02/16] objtool: Explicitly avoid self modifying code in .altinstr_replacement Peter Zijlstra
  2021-10-28  4:01   ` Josh Poimboeuf
@ 2021-10-29  8:03   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 48+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-10-29  8:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Borislav Petkov, Josh Poimboeuf, Alexei Starovoitov, x86,
	linux-kernel

The following commit has been merged into the objtool/core branch of tip:

Commit-ID:     dd003edeffa3cb87bc9862582004f405d77d7670
Gitweb:        https://git.kernel.org/tip/dd003edeffa3cb87bc9862582004f405d77d7670
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 26 Oct 2021 14:01:34 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 28 Oct 2021 23:25:25 +02:00

objtool: Explicitly avoid self modifying code in .altinstr_replacement

Assume ALTERNATIVE()s know what they're doing and do not change, or
cause to change, instructions in .altinstr_replacement sections.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120309.722511775@infradead.org
---
 tools/objtool/check.c | 42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index fdbc6d2..8ab6f24 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -993,18 +993,27 @@ static void remove_insn_ops(struct instruction *insn)
 	}
 }
 
-static void add_call_dest(struct objtool_file *file, struct instruction *insn,
-			  struct symbol *dest, bool sibling)
+static void annotate_call_site(struct objtool_file *file,
+			       struct instruction *insn, bool sibling)
 {
 	struct reloc *reloc = insn_reloc(file, insn);
+	struct symbol *sym = insn->call_dest;
 
-	insn->call_dest = dest;
-	if (!dest)
+	if (!sym)
+		sym = reloc->sym;
+
+	/*
+	 * Alternative replacement code is just template code which is
+	 * sometimes copied to the original instruction. For now, don't
+	 * annotate it. (In the future we might consider annotating the
+	 * original instruction if/when it ever makes sense to do so.)
+	 */
+	if (!strcmp(insn->sec->name, ".altinstr_replacement"))
 		return;
 
-	if (insn->call_dest->static_call_tramp) {
-		list_add_tail(&insn->call_node,
-			      &file->static_call_list);
+	if (sym->static_call_tramp) {
+		list_add_tail(&insn->call_node, &file->static_call_list);
+		return;
 	}
 
 	/*
@@ -1012,7 +1021,7 @@ static void add_call_dest(struct objtool_file *file, struct instruction *insn,
 	 * so they need a little help, NOP out any KCOV calls from noinstr
 	 * text.
 	 */
-	if (insn->sec->noinstr && insn->call_dest->kcov) {
+	if (insn->sec->noinstr && sym->kcov) {
 		if (reloc) {
 			reloc->type = R_NONE;
 			elf_write_reloc(file->elf, reloc);
@@ -1024,9 +1033,10 @@ static void add_call_dest(struct objtool_file *file, struct instruction *insn,
 			               : arch_nop_insn(insn->len));
 
 		insn->type = sibling ? INSN_RETURN : INSN_NOP;
+		return;
 	}
 
-	if (mcount && insn->call_dest->fentry) {
+	if (mcount && sym->fentry) {
 		if (sibling)
 			WARN_FUNC("Tail call to __fentry__ !?!?", insn->sec, insn->offset);
 
@@ -1041,9 +1051,17 @@ static void add_call_dest(struct objtool_file *file, struct instruction *insn,
 
 		insn->type = INSN_NOP;
 
-		list_add_tail(&insn->mcount_loc_node,
-			      &file->mcount_loc_list);
+		list_add_tail(&insn->mcount_loc_node, &file->mcount_loc_list);
+		return;
 	}
+}
+
+static void add_call_dest(struct objtool_file *file, struct instruction *insn,
+			  struct symbol *dest, bool sibling)
+{
+	insn->call_dest = dest;
+	if (!dest)
+		return;
 
 	/*
 	 * Whatever stack impact regular CALLs have, should be undone
@@ -1053,6 +1071,8 @@ static void add_call_dest(struct objtool_file *file, struct instruction *insn,
 	 * are converted to JUMP, see read_intra_function_calls().
 	 */
 	remove_insn_ops(insn);
+
+	annotate_call_site(file, insn, sibling);
 }
 
 /*

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip: objtool/core] objtool: Classify symbols
  2021-10-26 12:01 ` [PATCH v3 01/16] objtool: Classify symbols Peter Zijlstra
  2021-10-28  3:58   ` Josh Poimboeuf
@ 2021-10-29  8:03   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 48+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-10-29  8:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Borislav Petkov, Josh Poimboeuf, Alexei Starovoitov, x86,
	linux-kernel

The following commit has been merged into the objtool/core branch of tip:

Commit-ID:     1739c66eb7bd5f27f1b69a5a26e10e8327d1e136
Gitweb:        https://git.kernel.org/tip/1739c66eb7bd5f27f1b69a5a26e10e8327d1e136
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 26 Oct 2021 14:01:33 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 28 Oct 2021 23:25:24 +02:00

objtool: Classify symbols

In order to avoid calling str*cmp() on symbol names, over and over, do
them all once upfront and store the result.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120309.658539311@infradead.org
---
 tools/objtool/check.c               | 34 ++++++++++++++++++----------
 tools/objtool/include/objtool/elf.h |  7 ++++--
 2 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 7c865a1..fdbc6d2 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1012,8 +1012,7 @@ static void add_call_dest(struct objtool_file *file, struct instruction *insn,
 	 * so they need a little help, NOP out any KCOV calls from noinstr
 	 * text.
 	 */
-	if (insn->sec->noinstr &&
-	    !strncmp(insn->call_dest->name, "__sanitizer_cov_", 16)) {
+	if (insn->sec->noinstr && insn->call_dest->kcov) {
 		if (reloc) {
 			reloc->type = R_NONE;
 			elf_write_reloc(file->elf, reloc);
@@ -1027,7 +1026,7 @@ static void add_call_dest(struct objtool_file *file, struct instruction *insn,
 		insn->type = sibling ? INSN_RETURN : INSN_NOP;
 	}
 
-	if (mcount && !strcmp(insn->call_dest->name, "__fentry__")) {
+	if (mcount && insn->call_dest->fentry) {
 		if (sibling)
 			WARN_FUNC("Tail call to __fentry__ !?!?", insn->sec, insn->offset);
 
@@ -1077,7 +1076,7 @@ static int add_jump_destinations(struct objtool_file *file)
 		} else if (reloc->sym->type == STT_SECTION) {
 			dest_sec = reloc->sym->sec;
 			dest_off = arch_dest_reloc_offset(reloc->addend);
-		} else if (arch_is_retpoline(reloc->sym)) {
+		} else if (reloc->sym->retpoline_thunk) {
 			/*
 			 * Retpoline jumps are really dynamic jumps in
 			 * disguise, so convert them accordingly.
@@ -1218,7 +1217,7 @@ static int add_call_destinations(struct objtool_file *file)
 
 			add_call_dest(file, insn, dest, false);
 
-		} else if (arch_is_retpoline(reloc->sym)) {
+		} else if (reloc->sym->retpoline_thunk) {
 			/*
 			 * Retpoline calls are really dynamic calls in
 			 * disguise, so convert them accordingly.
@@ -1907,17 +1906,28 @@ static int read_intra_function_calls(struct objtool_file *file)
 	return 0;
 }
 
-static int read_static_call_tramps(struct objtool_file *file)
+static int classify_symbols(struct objtool_file *file)
 {
 	struct section *sec;
 	struct symbol *func;
 
 	for_each_sec(file, sec) {
 		list_for_each_entry(func, &sec->symbol_list, list) {
-			if (func->bind == STB_GLOBAL &&
-			    !strncmp(func->name, STATIC_CALL_TRAMP_PREFIX_STR,
+			if (func->bind != STB_GLOBAL)
+				continue;
+
+			if (!strncmp(func->name, STATIC_CALL_TRAMP_PREFIX_STR,
 				     strlen(STATIC_CALL_TRAMP_PREFIX_STR)))
 				func->static_call_tramp = true;
+
+			if (arch_is_retpoline(func))
+				func->retpoline_thunk = true;
+
+			if (!strcmp(func->name, "__fentry__"))
+				func->fentry = true;
+
+			if (!strncmp(func->name, "__sanitizer_cov_", 16))
+				func->kcov = true;
 		}
 	}
 
@@ -1983,7 +1993,7 @@ static int decode_sections(struct objtool_file *file)
 	/*
 	 * Must be before add_{jump_call}_destination.
 	 */
-	ret = read_static_call_tramps(file);
+	ret = classify_symbols(file);
 	if (ret)
 		return ret;
 
@@ -2041,9 +2051,9 @@ static int decode_sections(struct objtool_file *file)
 
 static bool is_fentry_call(struct instruction *insn)
 {
-	if (insn->type == INSN_CALL && insn->call_dest &&
-	    insn->call_dest->type == STT_NOTYPE &&
-	    !strcmp(insn->call_dest->name, "__fentry__"))
+	if (insn->type == INSN_CALL &&
+	    insn->call_dest &&
+	    insn->call_dest->fentry)
 		return true;
 
 	return false;
diff --git a/tools/objtool/include/objtool/elf.h b/tools/objtool/include/objtool/elf.h
index c48c106..c2dbf53 100644
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -54,8 +54,11 @@ struct symbol {
 	unsigned long offset;
 	unsigned int len;
 	struct symbol *pfunc, *cfunc, *alias;
-	bool uaccess_safe;
-	bool static_call_tramp;
+	u8 uaccess_safe      : 1;
+	u8 static_call_tramp : 1;
+	u8 retpoline_thunk   : 1;
+	u8 fentry            : 1;
+	u8 kcov              : 1;
 	struct list_head pv_target;
 };
 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic
  2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
                   ` (18 preceding siblings ...)
  2021-10-28 17:43 ` Borislav Petkov
@ 2021-10-29 12:43 ` Miroslav Benes
  19 siblings, 0 replies; 48+ messages in thread
From: Miroslav Benes @ 2021-10-29 12:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, jpoimboe, andrew.cooper3, linux-kernel, alexei.starovoitov,
	ndesaulniers, bpf

On Tue, 26 Oct 2021, Peter Zijlstra wrote:

> Hi,
> 
> These patches rewrite the way retpolines are rewritten. Currently objtool emits
> alternative entries for most retpoline calls. However trying to extend that led
> to trouble (ELF files are horrid).
> 
> Therefore completely overhaul this and have objtool emit a .retpoline_sites
> section that lists all compiler generated retpoline thunk calls. Then the
> kernel can do with them as it pleases.
> 
> Notably it will:
> 
>  - rewrite them to indirect instructions for !RETPOLINE
>  - rewrite them to lfence; indirect; for RETPOLINE_AMD,
>    where size allows (boo clang!)
> 
> Specifically, the !RETPOLINE case can now also deal with the clang-special
> conditional-indirect-tail-call:
> 
>   Jcc __x86_indirect_thunk_\reg.
> 
> Finally, also update the x86 BPF jit to catch up to recent times and do these
> same things.
> 
> All this should help improve performance by removing an indirection.
> 
> Patches can (soon) be found here:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git objtool/core
> 
> Changes since v2:
> 
>  - rewrite the __x86_indirect_thunk_array[] stuff again
>  - rewrite the retpoline,amd rewrite logic, it now also supports
>    rewriting the Jcc case, if the original instruction is long enough, but
>    more importantly, it's simpler code.
>  - bpf label simplification patch
>  - random assorted cleanups
>  - actually managed to get bpf selftests working

It is already in tip, but FWIW the objtool changes look good to me

Reviewed-by: Miroslav Benes <mbenes@suse.cz>

M

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2021-10-29 12:43 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-26 12:01 [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Peter Zijlstra
2021-10-26 12:01 ` [PATCH v3 01/16] objtool: Classify symbols Peter Zijlstra
2021-10-28  3:58   ` Josh Poimboeuf
2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
2021-10-26 12:01 ` [PATCH v3 02/16] objtool: Explicitly avoid self modifying code in .altinstr_replacement Peter Zijlstra
2021-10-28  4:01   ` Josh Poimboeuf
2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
2021-10-26 12:01 ` [PATCH v3 03/16] objtool: Shrink struct instruction Peter Zijlstra
2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
2021-10-26 12:01 ` [PATCH v3 04/16] objtool,x86: Replace alternatives with .retpoline_sites Peter Zijlstra
2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
2021-10-26 12:01 ` [PATCH v3 05/16] x86/retpoline: Remove unused replacement symbols Peter Zijlstra
2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
2021-10-26 12:01 ` [PATCH v3 06/16] x86/asm: Fix register order Peter Zijlstra
2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
2021-10-26 12:01 ` [PATCH v3 07/16] x86/asm: Fixup odd GEN-for-each-reg.h usage Peter Zijlstra
2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
2021-10-26 12:01 ` [PATCH v3 08/16] x86/retpoline: Move the retpoline thunk declarations to nospec-branch.h Peter Zijlstra
2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
2021-10-26 12:01 ` [PATCH v3 09/16] x86/retpoline: Create a retpoline thunk array Peter Zijlstra
2021-10-29  8:03   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
2021-10-26 12:01 ` [PATCH v3 10/16] x86/alternative: Implement .retpoline_sites support Peter Zijlstra
2021-10-27 17:38   ` Borislav Petkov
2021-10-27 19:10     ` Peter Zijlstra
2021-10-29  8:02   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
2021-10-26 12:01 ` [PATCH v3 11/16] x86/alternative: Handle Jcc __x86_indirect_thunk_\reg Peter Zijlstra
2021-10-28  9:29   ` Borislav Petkov
2021-10-29  8:02   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
2021-10-26 12:01 ` [PATCH v3 12/16] x86/alternative: Try inline spectre_v2=retpoline,amd Peter Zijlstra
2021-10-29  8:02   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
2021-10-26 12:01 ` [PATCH v3 13/16] x86/alternative: Add debug prints to apply_retpolines() Peter Zijlstra
2021-10-29  8:02   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
2021-10-26 12:01 ` [PATCH v3 14/16] x86,bugs: Unconditionally allow spectre_v2=retpoline,amd Peter Zijlstra
2021-10-29  8:02   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
2021-10-26 12:01 ` [PATCH v3 15/16] bpf, x86: Simplify computing label offsets Peter Zijlstra
2021-10-29  8:02   ` [tip: objtool/core] bpf,x86: " tip-bot2 for Peter Zijlstra
2021-10-26 12:01 ` [PATCH v3 16/16] bpf,x86: Respect X86_FEATURE_RETPOLINE* Peter Zijlstra
2021-10-29  8:02   ` [tip: objtool/core] " tip-bot2 for Peter Zijlstra
2021-10-26 18:26 ` [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic Alexei Starovoitov
2021-10-26 18:45   ` Peter Zijlstra
2021-10-26 20:00     ` Alexei Starovoitov
2021-10-26 21:05       ` Peter Zijlstra
2021-10-26 21:05         ` Alexei Starovoitov
2021-10-27  9:00           ` Peter Zijlstra
2021-10-27 17:32             ` Alexei Starovoitov
2021-10-28  5:17 ` Josh Poimboeuf
2021-10-28 17:43 ` Borislav Petkov
2021-10-29 12:43 ` Miroslav Benes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).