LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table
@ 2019-04-23  1:30 Lianbo Jiang
  2019-04-23  1:30 ` [PATCH 1/3 v11] x86/e820, resource: add a new I/O resource descriptor 'IORES_DESC_RESERVED' Lianbo Jiang
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Lianbo Jiang @ 2019-04-23  1:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: kexec, tglx, mingo, bp, akpm, dave.hansen, luto, peterz, x86,
	hpa, dyoung, bhe, Thomas.Lendacky

This patchset did three things:

a). x86/e820, resource: add a new I/O resource descriptor 'IORES_DESC_
    RESERVED'

b). x86/mm: change the check condition in SEV because a new descriptor is
    introduced

c). x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

Changes since v1:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.

Changes since v2:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.
2. Modified the invalid SOB chain issue.

Changes since v3:
1. Dropped [PATCH 1/3 v3] resource: fix an error which walks through iomem
   resources. Please refer to this commit <010a93bf97c7> "resource: Fix
   find_next_iomem_res() iteration issue"

Changes since v4:
1. Improve the patch log, and add kernel log.

Changes since v5:
1. Rewrite these patches log.

Changes since v6:
1. Modify the [PATCH 1/2], and add the new I/O resource descriptor
   'IORES_DESC_RESERVED' for the iomem resources search interfaces,
   and also updates these codes relates to 'IORES_DESC_NONE'.
2. Modify the [PATCH 2/2], and walk through io resource based on the
   new descriptor 'IORES_DESC_RESERVED'.
3. Update patch log.

Changes since v7:
1. Improve patch log.
2. Improve this function __ioremap_check_desc_other().
3. Modify code comment in the __ioremap_check_desc_other()

Changes since v8:
1. Get rid of all changes about ia64.(Borislav's suggestion)
2. Change the examination condition to the 'IORES_DESC_ACPI_*'.
3. Modify the signature. This patch(add the new I/O resource
   descriptor 'IORES_DESC_RESERVED') was suggested by Boris.

Changes since v9:
1. Improve patch log.
2. No need to modify the kernel/resource.c, so correct them.
3. Change the name of the __ioremap_check_desc_other() to
   __ioremap_check_desc_none_and_reserved(), and modify the
   check condition, add comment above it.

Changes since v10:
1. Split them into three patches, the second patch is currently added.
2. Change struct ioremap_mem_flags to struct ioremap_desc and redefine
it.
3. Change the name of the __ioremap_check_desc_other() to
__ioremap_check_desc().
4. Change the check condition in SEV and also improve them.
5. Modify the return value for some functions.

Lianbo Jiang (3):
  x86/e820, resource: add a new I/O resource descriptor
    'IORES_DESC_RESERVED'
  x86/mm: change the check condition in SEV because a new descriptor is
    introduced
  x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

 arch/x86/kernel/crash.c |  6 +++++
 arch/x86/kernel/e820.c  |  2 +-
 arch/x86/mm/ioremap.c   | 59 ++++++++++++++++++++++++++---------------
 include/linux/ioport.h  | 10 +++++++
 4 files changed, 54 insertions(+), 23 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/3 v11] x86/e820, resource: add a new I/O resource descriptor 'IORES_DESC_RESERVED'
  2019-04-23  1:30 [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table Lianbo Jiang
@ 2019-04-23  1:30 ` Lianbo Jiang
  2019-06-20  9:59   ` [tip:x86/kdump] x86/e820, ioport: Add a new I/O resource descriptor IORES_DESC_RESERVED tip-bot for Lianbo Jiang
  2019-04-23  1:30 ` [PATCH 2/3 v11] x86/mm: change the check condition in SEV because a new descriptor is introduced Lianbo Jiang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Lianbo Jiang @ 2019-04-23  1:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: kexec, tglx, mingo, bp, akpm, dave.hansen, luto, peterz, x86,
	hpa, dyoung, bhe, Thomas.Lendacky

When doing kexec_file_load(), the first kernel needs to pass the e820
reserved ranges to the second kernel, because some devices may use it
in kdump kernel, such as PCI devices.

But, the kernel can not exactly match the e820 reserved ranges when
walking through the iomem resources via the 'IORES_DESC_NONE', because
there are several types of e820 that are described as the 'IORES_DESC_NONE'
type. Please refer to the e820_type_to_iores_desc().

Therefore, add a new I/O resource descriptor 'IORES_DESC_RESERVED' for
the iomem resources search interfaces. It is helpful to exactly match
the reserved resource ranges when walking through iomem resources.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
---
 arch/x86/kernel/e820.c | 2 +-
 include/linux/ioport.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 2879e234e193..16fcde196243 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1050,10 +1050,10 @@ static unsigned long __init e820_type_to_iores_desc(struct e820_entry *entry)
 	case E820_TYPE_NVS:		return IORES_DESC_ACPI_NV_STORAGE;
 	case E820_TYPE_PMEM:		return IORES_DESC_PERSISTENT_MEMORY;
 	case E820_TYPE_PRAM:		return IORES_DESC_PERSISTENT_MEMORY_LEGACY;
+	case E820_TYPE_RESERVED:	return IORES_DESC_RESERVED;
 	case E820_TYPE_RESERVED_KERN:	/* Fall-through: */
 	case E820_TYPE_RAM:		/* Fall-through: */
 	case E820_TYPE_UNUSABLE:	/* Fall-through: */
-	case E820_TYPE_RESERVED:	/* Fall-through: */
 	default:			return IORES_DESC_NONE;
 	}
 }
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..6ed59de48bd5 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -133,6 +133,7 @@ enum {
 	IORES_DESC_PERSISTENT_MEMORY_LEGACY	= 5,
 	IORES_DESC_DEVICE_PRIVATE_MEMORY	= 6,
 	IORES_DESC_DEVICE_PUBLIC_MEMORY		= 7,
+	IORES_DESC_RESERVED			= 8,
 };
 
 /* helpers to define resources */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/3 v11] x86/mm: change the check condition in SEV because a new descriptor is introduced
  2019-04-23  1:30 [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table Lianbo Jiang
  2019-04-23  1:30 ` [PATCH 1/3 v11] x86/e820, resource: add a new I/O resource descriptor 'IORES_DESC_RESERVED' Lianbo Jiang
@ 2019-04-23  1:30 ` Lianbo Jiang
  2019-06-20 10:00   ` [tip:x86/kdump] x86/mm: Rework ioremap resource mapping determination tip-bot for Lianbo Jiang
  2019-04-23  1:30 ` [PATCH 3/3 v11] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table Lianbo Jiang
  2019-05-28  7:30 ` [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel " lijiang
  3 siblings, 1 reply; 25+ messages in thread
From: Lianbo Jiang @ 2019-04-23  1:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: kexec, tglx, mingo, bp, akpm, dave.hansen, luto, peterz, x86,
	hpa, dyoung, bhe, Thomas.Lendacky

Originally, those areas described as IORES_DESC_NONE are not mapped
encrypted in SEV when using ioremap(). It checks for a resource that
is not described as IORES_DESC_NONE, which can ensure the reserved
areas are not mapped encrypted when using ioremap().

But now, a new descriptor IORES_DESC_RESERVED has been created for
the reserved areas, similarly, the IORES_DESC_{NONE,RESERVED} should
not be mapped encrypted in SEV when using ioremap().

Therefore, need to modify the check condition in SEV and improve them.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
---
 arch/x86/mm/ioremap.c  | 59 ++++++++++++++++++++++++++----------------
 include/linux/ioport.h |  9 +++++++
 2 files changed, 46 insertions(+), 22 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index dd73d5d74393..82be5707124b 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -27,9 +27,8 @@
 
 #include "physaddr.h"
 
-struct ioremap_mem_flags {
-	bool system_ram;
-	bool desc_other;
+struct ioremap_desc {
+	unsigned int flags;
 };
 
 /*
@@ -61,13 +60,13 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long size,
 	return err;
 }
 
-static bool __ioremap_check_ram(struct resource *res)
+static unsigned int __ioremap_check_ram(struct resource *res)
 {
 	unsigned long start_pfn, stop_pfn;
 	unsigned long i;
 
 	if ((res->flags & IORESOURCE_SYSTEM_RAM) != IORESOURCE_SYSTEM_RAM)
-		return false;
+		return 0;
 
 	start_pfn = (res->start + PAGE_SIZE - 1) >> PAGE_SHIFT;
 	stop_pfn = (res->end + 1) >> PAGE_SHIFT;
@@ -75,28 +74,44 @@ static bool __ioremap_check_ram(struct resource *res)
 		for (i = 0; i < (stop_pfn - start_pfn); ++i)
 			if (pfn_valid(start_pfn + i) &&
 			    !PageReserved(pfn_to_page(start_pfn + i)))
-				return true;
+				return IORES_MAP_SYSTEM_RAM;
 	}
 
-	return false;
+	return 0;
 }
 
-static int __ioremap_check_desc_other(struct resource *res)
+/*
+ * NONE and RESERVED should not be mapped encrypted in SEV because there
+ * the whole memory is already encrypted.
+ */
+static unsigned int __ioremap_check_desc(struct resource *res)
 {
-	return (res->desc != IORES_DESC_NONE);
+	if (!sev_active())
+		return 0;
+
+	switch (res->desc) {
+	case IORES_DESC_NONE:
+	case IORES_DESC_RESERVED:
+		break;
+	default:
+		return IORES_MAP_ENCRYPTED;
+	}
+
+	return 0;
 }
 
 static int __ioremap_res_check(struct resource *res, void *arg)
 {
-	struct ioremap_mem_flags *flags = arg;
+	struct ioremap_desc *desc = arg;
 
-	if (!flags->system_ram)
-		flags->system_ram = __ioremap_check_ram(res);
+	if (!(desc->flags & IORES_MAP_SYSTEM_RAM))
+		desc->flags |= __ioremap_check_ram(res);
 
-	if (!flags->desc_other)
-		flags->desc_other = __ioremap_check_desc_other(res);
+	if (!(desc->flags & IORES_MAP_ENCRYPTED))
+		desc->flags |= __ioremap_check_desc(res);
 
-	return flags->system_ram && flags->desc_other;
+	return ((desc->flags & (IORES_MAP_SYSTEM_RAM | IORES_MAP_ENCRYPTED)) ==
+		(IORES_MAP_SYSTEM_RAM | IORES_MAP_ENCRYPTED));
 }
 
 /*
@@ -105,15 +120,15 @@ static int __ioremap_res_check(struct resource *res, void *arg)
  * resource described not as IORES_DESC_NONE (e.g. IORES_DESC_ACPI_TABLES).
  */
 static void __ioremap_check_mem(resource_size_t addr, unsigned long size,
-				struct ioremap_mem_flags *flags)
+				struct ioremap_desc *desc)
 {
 	u64 start, end;
 
 	start = (u64)addr;
 	end = start + size - 1;
-	memset(flags, 0, sizeof(*flags));
+	memset(desc, 0, sizeof(struct ioremap_desc));
 
-	walk_mem_res(start, end, flags, __ioremap_res_check);
+	walk_mem_res(start, end, desc, __ioremap_res_check);
 }
 
 /*
@@ -138,7 +153,7 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
 	resource_size_t last_addr;
 	const resource_size_t unaligned_phys_addr = phys_addr;
 	const unsigned long unaligned_size = size;
-	struct ioremap_mem_flags mem_flags;
+	struct ioremap_desc io_desc;
 	struct vm_struct *area;
 	enum page_cache_mode new_pcm;
 	pgprot_t prot;
@@ -157,12 +172,12 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
 		return NULL;
 	}
 
-	__ioremap_check_mem(phys_addr, size, &mem_flags);
+	__ioremap_check_mem(phys_addr, size, &io_desc);
 
 	/*
 	 * Don't allow anybody to remap normal RAM that we're using..
 	 */
-	if (mem_flags.system_ram) {
+	if (io_desc.flags & IORES_MAP_SYSTEM_RAM) {
 		WARN_ONCE(1, "ioremap on RAM at %pa - %pa\n",
 			  &phys_addr, &last_addr);
 		return NULL;
@@ -200,7 +215,7 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
 	 * resulting mapping.
 	 */
 	prot = PAGE_KERNEL_IO;
-	if ((sev_active() && mem_flags.desc_other) || encrypted)
+	if ((io_desc.flags & IORES_MAP_ENCRYPTED) || encrypted)
 		prot = pgprot_encrypted(prot);
 
 	switch (pcm) {
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6ed59de48bd5..5db386cfc2d4 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -12,6 +12,7 @@
 #ifndef __ASSEMBLY__
 #include <linux/compiler.h>
 #include <linux/types.h>
+#include <linux/bits.h>
 /*
  * Resources are tree-like, allowing
  * nesting etc..
@@ -136,6 +137,14 @@ enum {
 	IORES_DESC_RESERVED			= 8,
 };
 
+/*
+ * Flags controlling ioremap() behavior.
+ */
+enum {
+	IORES_MAP_SYSTEM_RAM		= BIT(0),
+	IORES_MAP_ENCRYPTED		= BIT(1),
+};
+
 /* helpers to define resources */
 #define DEFINE_RES_NAMED(_start, _size, _name, _flags)			\
 	{								\
-- 
2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 3/3 v11] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table
  2019-04-23  1:30 [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table Lianbo Jiang
  2019-04-23  1:30 ` [PATCH 1/3 v11] x86/e820, resource: add a new I/O resource descriptor 'IORES_DESC_RESERVED' Lianbo Jiang
  2019-04-23  1:30 ` [PATCH 2/3 v11] x86/mm: change the check condition in SEV because a new descriptor is introduced Lianbo Jiang
@ 2019-04-23  1:30 ` Lianbo Jiang
  2019-06-20 10:01   ` [tip:x86/kdump] x86/crash: Add e820 reserved ranges to kdump kernel's " tip-bot for Lianbo Jiang
  2019-05-28  7:30 ` [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel " lijiang
  3 siblings, 1 reply; 25+ messages in thread
From: Lianbo Jiang @ 2019-04-23  1:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: kexec, tglx, mingo, bp, akpm, dave.hansen, luto, peterz, x86,
	hpa, dyoung, bhe, Thomas.Lendacky

At present, when using the kexec_file_load() syscall to load the kernel
image and initramfs(for example: kexec -s -p xxx), the kernel does not
pass the e820 reserved ranges to the second kernel, which might cause
two problems:

The first one is the MMCONFIG issue. The basic problem is that this
device is in PCI segment 1 and the kernel PCI probing can not find it
without all the e820 I/O reservations being present in the e820 table.
And the kdump kernel does not have those reservations because the kexec
command does not pass the I/O reservation via the "memmap=xxx" command
line option. (This problem does not show up for other vendors, as SGI
is apparently the actually fails for everyone, but devices in segment 0
are then found by some legacy lookup method.) The workaround for this
is to pass the I/O reserved regions to the kdump kernel.

MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0xffff), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]

The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.

The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to the kdump kernel.

Suggested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
---
 arch/x86/kernel/crash.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 17ffc869cab8..1db2754df9e9 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -381,6 +381,12 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
 	walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, &cmd,
 			memmap_entry_callback);
 
+	/* Add e820 reserved ranges */
+	cmd.type = E820_TYPE_RESERVED;
+	flags = IORESOURCE_MEM;
+	walk_iomem_res_desc(IORES_DESC_RESERVED, flags, 0, -1, &cmd,
+			   memmap_entry_callback);
+
 	/* Add crashk_low_res region */
 	if (crashk_low_res.end) {
 		ei.addr = crashk_low_res.start;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table
  2019-04-23  1:30 [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table Lianbo Jiang
                   ` (2 preceding siblings ...)
  2019-04-23  1:30 ` [PATCH 3/3 v11] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table Lianbo Jiang
@ 2019-05-28  7:30 ` lijiang
  2019-06-07 17:42   ` Borislav Petkov
  3 siblings, 1 reply; 25+ messages in thread
From: lijiang @ 2019-05-28  7:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: kexec, tglx, mingo, bp, akpm, dave.hansen, luto, peterz, x86,
	hpa, dyoung, bhe, Thomas.Lendacky

Hi, Boris and Thomas

Could you give me any suggestions about this patch series? Other reviewers?

Thanks.
Lianbo

在 2019年04月23日 09:30, Lianbo Jiang 写道:
> This patchset did three things:
> 
> a). x86/e820, resource: add a new I/O resource descriptor 'IORES_DESC_
>     RESERVED'
> 
> b). x86/mm: change the check condition in SEV because a new descriptor is
>     introduced
> 
> c). x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table
> 
> Changes since v1:
> 1. Modified the value of flags to "0", when walking through the whole
> tree for e820 reserved ranges.
> 
> Changes since v2:
> 1. Modified the value of flags to "0", when walking through the whole
> tree for e820 reserved ranges.
> 2. Modified the invalid SOB chain issue.
> 
> Changes since v3:
> 1. Dropped [PATCH 1/3 v3] resource: fix an error which walks through iomem
>    resources. Please refer to this commit <010a93bf97c7> "resource: Fix
>    find_next_iomem_res() iteration issue"
> 
> Changes since v4:
> 1. Improve the patch log, and add kernel log.
> 
> Changes since v5:
> 1. Rewrite these patches log.
> 
> Changes since v6:
> 1. Modify the [PATCH 1/2], and add the new I/O resource descriptor
>    'IORES_DESC_RESERVED' for the iomem resources search interfaces,
>    and also updates these codes relates to 'IORES_DESC_NONE'.
> 2. Modify the [PATCH 2/2], and walk through io resource based on the
>    new descriptor 'IORES_DESC_RESERVED'.
> 3. Update patch log.
> 
> Changes since v7:
> 1. Improve patch log.
> 2. Improve this function __ioremap_check_desc_other().
> 3. Modify code comment in the __ioremap_check_desc_other()
> 
> Changes since v8:
> 1. Get rid of all changes about ia64.(Borislav's suggestion)
> 2. Change the examination condition to the 'IORES_DESC_ACPI_*'.
> 3. Modify the signature. This patch(add the new I/O resource
>    descriptor 'IORES_DESC_RESERVED') was suggested by Boris.
> 
> Changes since v9:
> 1. Improve patch log.
> 2. No need to modify the kernel/resource.c, so correct them.
> 3. Change the name of the __ioremap_check_desc_other() to
>    __ioremap_check_desc_none_and_reserved(), and modify the
>    check condition, add comment above it.
> 
> Changes since v10:
> 1. Split them into three patches, the second patch is currently added.
> 2. Change struct ioremap_mem_flags to struct ioremap_desc and redefine
> it.
> 3. Change the name of the __ioremap_check_desc_other() to
> __ioremap_check_desc().
> 4. Change the check condition in SEV and also improve them.
> 5. Modify the return value for some functions.
> 
> Lianbo Jiang (3):
>   x86/e820, resource: add a new I/O resource descriptor
>     'IORES_DESC_RESERVED'
>   x86/mm: change the check condition in SEV because a new descriptor is
>     introduced
>   x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table
> 
>  arch/x86/kernel/crash.c |  6 +++++
>  arch/x86/kernel/e820.c  |  2 +-
>  arch/x86/mm/ioremap.c   | 59 ++++++++++++++++++++++++++---------------
>  include/linux/ioport.h  | 10 +++++++
>  4 files changed, 54 insertions(+), 23 deletions(-)
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table
  2019-05-28  7:30 ` [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel " lijiang
@ 2019-06-07 17:42   ` Borislav Petkov
  2019-06-08  3:54     ` Baoquan He
  2019-06-09  4:02     ` lijiang
  0 siblings, 2 replies; 25+ messages in thread
From: Borislav Petkov @ 2019-06-07 17:42 UTC (permalink / raw)
  To: lijiang
  Cc: linux-kernel, kexec, tglx, mingo, akpm, dave.hansen, luto,
	peterz, x86, hpa, dyoung, bhe, Thomas.Lendacky

On Tue, May 28, 2019 at 03:30:21PM +0800, lijiang wrote:
> Hi, Boris and Thomas
> 
> Could you give me any suggestions about this patch series? Other reviewers?

So I'm testing this on a box with SME enabled but after loading the
crash kernel, it freezes instead of rebooting. My cmdline is:

 kexec -s -p /boot/vmlinuz-5.2.0-rc3+ --initrd=/boot/initrd.img-5.2.0-rc3+ --command-line="maxcpus=1 root=/dev/sda5 ro debug ignore_loglevel log_buf_len=16M no_console_suspend net.ifnames=0 systemd.log_target=null mem_encrypt=on kvm_amd.sev=1 nr_cpus=1 irqpoll reset_devices vga=normal LANG=en_US.UTF-8 earlyprintk=serial cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never disable_cpu_apicid=0"

and the reserved range is:

[    0.000000] Reserving 256MB of memory at 3392MB for crashkernel (System RAM: 16271MB)

I'm wondering if it is related to

https://lkml.kernel.org/r/20190604134952.GC26891@MiWiFi-R3L-srv

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table
  2019-06-07 17:42   ` Borislav Petkov
@ 2019-06-08  3:54     ` Baoquan He
  2019-06-08  9:10       ` Borislav Petkov
  2019-06-09  4:02     ` lijiang
  1 sibling, 1 reply; 25+ messages in thread
From: Baoquan He @ 2019-06-08  3:54 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: lijiang, linux-kernel, kexec, tglx, mingo, akpm, dave.hansen,
	luto, peterz, x86, hpa, dyoung, Thomas.Lendacky

On 06/07/19 at 07:42pm, Borislav Petkov wrote:
> On Tue, May 28, 2019 at 03:30:21PM +0800, lijiang wrote:
> > Hi, Boris and Thomas
> > 
> > Could you give me any suggestions about this patch series? Other reviewers?
> 
> So I'm testing this on a box with SME enabled but after loading the
> crash kernel, it freezes instead of rebooting. My cmdline is:
> 
>  kexec -s -p /boot/vmlinuz-5.2.0-rc3+ --initrd=/boot/initrd.img-5.2.0-rc3+ --command-line="maxcpus=1 root=/dev/sda5 ro debug ignore_loglevel log_buf_len=16M no_console_suspend net.ifnames=0 systemd.log_target=null mem_encrypt=on kvm_amd.sev=1 nr_cpus=1 irqpoll reset_devices vga=normal LANG=en_US.UTF-8 earlyprintk=serial cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never disable_cpu_apicid=0"
> 
> and the reserved range is:
> 
> [    0.000000] Reserving 256MB of memory at 3392MB for crashkernel (System RAM: 16271MB)

Is it a UEFI box? If it's uefi machine, it should relate to below issue.
Because kexec always fails to randomly choose a new position for kernel.

The current kexec code fills boot_params->efi_info->efi_loader_signature,
but doesn't contruct efi_memmap table. The kexec/kdump kernel will always
fail to find available slot for KASLR in process_efi_entries.


> 
> I'm wondering if it is related to
> 
> https://lkml.kernel.org/r/20190604134952.GC26891@MiWiFi-R3L-srv
> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table
  2019-06-08  3:54     ` Baoquan He
@ 2019-06-08  9:10       ` Borislav Petkov
  2019-06-08 10:01         ` Baoquan He
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2019-06-08  9:10 UTC (permalink / raw)
  To: Baoquan He
  Cc: lijiang, linux-kernel, kexec, tglx, mingo, akpm, dave.hansen,
	luto, peterz, x86, hpa, dyoung, Thomas.Lendacky

On Sat, Jun 08, 2019 at 11:54:51AM +0800, Baoquan He wrote:
> Is it a UEFI box?

Yes.

> If it's uefi machine, it should relate to below issue. Because kexec
> always fails to randomly choose a new position for kernel.

The kernel succeeds in selecting a position for the kernel - the kexec
kernel doesn't load when a panic happens. Rather, the box panics and
nothing more.

> The current kexec code fills boot_params->efi_info->efi_loader_signature,
> but doesn't contruct efi_memmap table. The kexec/kdump kernel will always
> fail to find available slot for KASLR in process_efi_entries.

Kernel has

# CONFIG_RANDOMIZE_BASE is not set

so no KASLR.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table
  2019-06-08  9:10       ` Borislav Petkov
@ 2019-06-08 10:01         ` Baoquan He
  2019-06-08 10:06           ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Baoquan He @ 2019-06-08 10:01 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: lijiang, linux-kernel, kexec, tglx, mingo, akpm, dave.hansen,
	luto, peterz, x86, hpa, dyoung, Thomas.Lendacky

On 06/08/19 at 11:10am, Borislav Petkov wrote:
> On Sat, Jun 08, 2019 at 11:54:51AM +0800, Baoquan He wrote:
> > Is it a UEFI box?
> 
> Yes.

OK, it doesn't matter with uefi since CONFIG_RANDOMIZE_BASE is not set. 


> 
> > If it's uefi machine, it should relate to below issue. Because kexec
> > always fails to randomly choose a new position for kernel.
> 
> The kernel succeeds in selecting a position for the kernel - the kexec
> kernel doesn't load when a panic happens. Rather, the box panics and
> nothing more.

OK, it may be different with the case we met, if panic happened when
load a kdump kernel.

We can load with 'kexec -l' or 'kexec -p', but can't boot after triggering
crash or execute 'kexec -e' to do kexec jumping.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table
  2019-06-08 10:01         ` Baoquan He
@ 2019-06-08 10:06           ` Borislav Petkov
  2019-06-08 10:26             ` Baoquan He
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2019-06-08 10:06 UTC (permalink / raw)
  To: Baoquan He
  Cc: lijiang, linux-kernel, kexec, tglx, mingo, akpm, dave.hansen,
	luto, peterz, x86, hpa, dyoung, Thomas.Lendacky

On Sat, Jun 08, 2019 at 06:01:39PM +0800, Baoquan He wrote:
> OK, it may be different with the case we met, if panic happened when
> load a kdump kernel.
> 
> We can load with 'kexec -l' or 'kexec -p', but can't boot after triggering
> crash or execute 'kexec -e' to do kexec jumping.

No, I load a kdump kernel properly with this command:

 kexec -s -p /boot/vmlinuz-5.2.0-rc3+ --initrd=/boot/initrd.img-5.2.0-rc3+ --command-line="maxcpus=1 root=/dev/sda5 ro debug ignore_loglevel
log_buf_len=16M no_console_suspend net.ifnames=0 systemd.log_target=null mem_encrypt=on kvm_amd.sev=1 nr_cpus=1 irqpoll reset_devices vga=normal
LANG=en_US.UTF-8 earlyprintk=serial cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug
transparent_hugepage=never disable_cpu_apicid=0"

And that succeeds judging from

$ grep . /sys/kernel/kexec_*

Then I trigger a panic with

echo c > /proc/sysrq-trigger

and this is where it hangs and doesn't load the kdump kernel.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table
  2019-06-08 10:06           ` Borislav Petkov
@ 2019-06-08 10:26             ` Baoquan He
  2019-06-10 11:37               ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Baoquan He @ 2019-06-08 10:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: lijiang, linux-kernel, kexec, tglx, mingo, akpm, dave.hansen,
	luto, peterz, x86, hpa, dyoung, Thomas.Lendacky

On 06/08/19 at 12:06pm, Borislav Petkov wrote:
> On Sat, Jun 08, 2019 at 06:01:39PM +0800, Baoquan He wrote:
> > OK, it may be different with the case we met, if panic happened when
> > load a kdump kernel.
> > 
> > We can load with 'kexec -l' or 'kexec -p', but can't boot after triggering
> > crash or execute 'kexec -e' to do kexec jumping.
> 
> No, I load a kdump kernel properly with this command:
> 
>  kexec -s -p /boot/vmlinuz-5.2.0-rc3+ --initrd=/boot/initrd.img-5.2.0-rc3+ --command-line="maxcpus=1 root=/dev/sda5 ro debug ignore_loglevel
> log_buf_len=16M no_console_suspend net.ifnames=0 systemd.log_target=null mem_encrypt=on kvm_amd.sev=1 nr_cpus=1 irqpoll reset_devices vga=normal
> LANG=en_US.UTF-8 earlyprintk=serial cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug
> transparent_hugepage=never disable_cpu_apicid=0"
> 
> And that succeeds judging from
> 
> $ grep . /sys/kernel/kexec_*
> 
> Then I trigger a panic with
> 
> echo c > /proc/sysrq-trigger
> 
> and this is where it hangs and doesn't load the kdump kernel.

OK, I see. Then it should be the issue we have met and talked about with
Tom.
https://lkml.kernel.org/r/20190604134952.GC26891@MiWiFi-R3L-srv

You can apply Tom's patch as below. I tested it, it can make kexec
kernel succeed to boot, but failed for kdump kernel booting. The kdump
kernel can boot till the end of kernel initialization, then hang with a
call trace. I have pasted the log in the above thread. Haven't got the
reason.
http://lkml.kernel.org/r/508c2853-dc4f-70a6-6fa8-97c950dc31c6@amd.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table
  2019-06-07 17:42   ` Borislav Petkov
  2019-06-08  3:54     ` Baoquan He
@ 2019-06-09  4:02     ` lijiang
  1 sibling, 0 replies; 25+ messages in thread
From: lijiang @ 2019-06-09  4:02 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, kexec, tglx, mingo, akpm, dave.hansen, luto,
	peterz, x86, hpa, dyoung, bhe, Thomas.Lendacky

在 2019年06月08日 01:42, Borislav Petkov 写道:
> On Tue, May 28, 2019 at 03:30:21PM +0800, lijiang wrote:
>> Hi, Boris and Thomas
>>
>> Could you give me any suggestions about this patch series? Other reviewers?
> 
> So I'm testing this on a box with SME enabled but after loading the
> crash kernel, it freezes instead of rebooting. My cmdline is:
> 
>  kexec -s -p /boot/vmlinuz-5.2.0-rc3+ --initrd=/boot/initrd.img-5.2.0-rc3+ --command-line="maxcpus=1 root=/dev/sda5 ro debug ignore_loglevel log_buf_len=16M no_console_suspend net.ifnames=0 systemd.log_target=null mem_encrypt=on kvm_amd.sev=1 nr_cpus=1 irqpoll reset_devices vga=normal LANG=en_US.UTF-8 earlyprintk=serial cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never disable_cpu_apicid=0"
> 
> and the reserved range is:
> 
> [    0.000000] Reserving 256MB of memory at 3392MB for crashkernel (System RAM: 16271MB)
> 
> I'm wondering if it is related to
> 
> https://lkml.kernel.org/r/20190604134952.GC26891@MiWiFi-R3L-srv
> 
Yes. It should be a SME issue.

Thanks.
Lianbo

> Thx.
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table
  2019-06-08 10:26             ` Baoquan He
@ 2019-06-10 11:37               ` Borislav Petkov
  2019-06-12  1:14                 ` lijiang
  2019-06-12  1:55                 ` Baoquan He
  0 siblings, 2 replies; 25+ messages in thread
From: Borislav Petkov @ 2019-06-10 11:37 UTC (permalink / raw)
  To: Baoquan He
  Cc: lijiang, linux-kernel, kexec, tglx, mingo, akpm, dave.hansen,
	luto, peterz, x86, hpa, dyoung, Thomas.Lendacky

On Sat, Jun 08, 2019 at 06:26:59PM +0800, Baoquan He wrote:
> OK, I see. Then it should be the issue we have met and talked about with
> Tom.
> https://lkml.kernel.org/r/20190604134952.GC26891@MiWiFi-R3L-srv
> 
> You can apply Tom's patch as below. I tested it, it can make kexec
> kernel succeed to boot, but failed for kdump kernel booting. The kdump
> kernel can boot till the end of kernel initialization, then hang with a
> call trace. I have pasted the log in the above thread. Haven't got the
> reason.
> http://lkml.kernel.org/r/508c2853-dc4f-70a6-6fa8-97c950dc31c6@amd.com

I can confirm the same observation.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table
  2019-06-10 11:37               ` Borislav Petkov
@ 2019-06-12  1:14                 ` lijiang
  2019-06-12  1:55                 ` Baoquan He
  1 sibling, 0 replies; 25+ messages in thread
From: lijiang @ 2019-06-12  1:14 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Baoquan He, linux-kernel, kexec, tglx, mingo, akpm, dave.hansen,
	luto, peterz, x86, hpa, dyoung, Thomas.Lendacky

在 2019年06月10日 19:37, Borislav Petkov 写道:
> On Sat, Jun 08, 2019 at 06:26:59PM +0800, Baoquan He wrote:
>> OK, I see. Then it should be the issue we have met and talked about with
>> Tom.
>> https://lkml.kernel.org/r/20190604134952.GC26891@MiWiFi-R3L-srv
>>
>> You can apply Tom's patch as below. I tested it, it can make kexec
>> kernel succeed to boot, but failed for kdump kernel booting. The kdump
>> kernel can boot till the end of kernel initialization, then hang with a
>> call trace. I have pasted the log in the above thread. Haven't got the
>> reason.
>> http://lkml.kernel.org/r/508c2853-dc4f-70a6-6fa8-97c950dc31c6@amd.com
> 
> I can confirm the same observation.
> 
Currently, i haven't seen any updates yet, so i'm not sure whether this patch
passed your test.

Thanks.
Lianbo

> Thx.
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table
  2019-06-10 11:37               ` Borislav Petkov
  2019-06-12  1:14                 ` lijiang
@ 2019-06-12  1:55                 ` Baoquan He
  2019-06-12  5:49                   ` Dave Young
  2019-06-12 15:10                   ` Borislav Petkov
  1 sibling, 2 replies; 25+ messages in thread
From: Baoquan He @ 2019-06-12  1:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: lijiang, linux-kernel, kexec, tglx, mingo, akpm, dave.hansen,
	luto, peterz, x86, hpa, dyoung, Thomas.Lendacky

On 06/10/19 at 01:37pm, Borislav Petkov wrote:
> On Sat, Jun 08, 2019 at 06:26:59PM +0800, Baoquan He wrote:
> > OK, I see. Then it should be the issue we have met and talked about with
> > Tom.
> > https://lkml.kernel.org/r/20190604134952.GC26891@MiWiFi-R3L-srv
> > 
> > You can apply Tom's patch as below. I tested it, it can make kexec
> > kernel succeed to boot, but failed for kdump kernel booting. The kdump
> > kernel can boot till the end of kernel initialization, then hang with a
> > call trace. I have pasted the log in the above thread. Haven't got the
> > reason.
> > http://lkml.kernel.org/r/508c2853-dc4f-70a6-6fa8-97c950dc31c6@amd.com
> 
> I can confirm the same observation.

With further investigation, the failure after applying Tom's patch is
caused by OOM. When increase crashkernel reservation to 512M, kdump
kernel can boot successfully. I noticed your crashkernel reservation is
256M, that will fail and stuck there very possibly.

So Tom's patch can fix the issue. We need further check why much more
crashkernel memory is needed on those AMD boxes with sme support..

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table
  2019-06-12  1:55                 ` Baoquan He
@ 2019-06-12  5:49                   ` Dave Young
  2019-06-12 15:10                   ` Borislav Petkov
  1 sibling, 0 replies; 25+ messages in thread
From: Dave Young @ 2019-06-12  5:49 UTC (permalink / raw)
  To: Baoquan He
  Cc: Borislav Petkov, lijiang, linux-kernel, kexec, tglx, mingo, akpm,
	dave.hansen, luto, peterz, x86, hpa, Thomas.Lendacky

On 06/12/19 at 09:55am, Baoquan He wrote:
> On 06/10/19 at 01:37pm, Borislav Petkov wrote:
> > On Sat, Jun 08, 2019 at 06:26:59PM +0800, Baoquan He wrote:
> > > OK, I see. Then it should be the issue we have met and talked about with
> > > Tom.
> > > https://lkml.kernel.org/r/20190604134952.GC26891@MiWiFi-R3L-srv
> > > 
> > > You can apply Tom's patch as below. I tested it, it can make kexec
> > > kernel succeed to boot, but failed for kdump kernel booting. The kdump
> > > kernel can boot till the end of kernel initialization, then hang with a
> > > call trace. I have pasted the log in the above thread. Haven't got the
> > > reason.
> > > http://lkml.kernel.org/r/508c2853-dc4f-70a6-6fa8-97c950dc31c6@amd.com
> > 
> > I can confirm the same observation.
> 
> With further investigation, the failure after applying Tom's patch is
> caused by OOM. When increase crashkernel reservation to 512M, kdump
> kernel can boot successfully. I noticed your crashkernel reservation is
> 256M, that will fail and stuck there very possibly.

Usually for Fedora/RHEL variant kernel + userspace, 160M is a good value
works for common setup.  Sometimes people forgot to strip debuginfo,
thus the kernel modules packed in initramfs is too big and cause out of
memory.

> 
> So Tom's patch can fix the issue. We need further check why much more
> crashkernel memory is needed on those AMD boxes with sme support..
> 
> Thanks
> Baoquan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table
  2019-06-12  1:55                 ` Baoquan He
  2019-06-12  5:49                   ` Dave Young
@ 2019-06-12 15:10                   ` Borislav Petkov
  2019-06-12 16:52                     ` Lendacky, Thomas
  1 sibling, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2019-06-12 15:10 UTC (permalink / raw)
  To: Baoquan He, Thomas.Lendacky
  Cc: lijiang, linux-kernel, kexec, tglx, mingo, akpm, dave.hansen,
	luto, peterz, x86, hpa, dyoung

On Wed, Jun 12, 2019 at 09:55:49AM +0800, Baoquan He wrote:
> With further investigation, the failure after applying Tom's patch is
> caused by OOM. When increase crashkernel reservation to 512M, kdump
> kernel can boot successfully. I noticed your crashkernel reservation is
> 256M, that will fail and stuck there very possibly.
> 
> So Tom's patch can fix the issue. We need further check why much more
> crashkernel memory is needed on those AMD boxes with sme support..

Yes, 256M for a kexec kernel sounds pretty much enough to me. So there's
something else at play here. I wonder if that workarea after _end, from
Tom's patch, needs so much room...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table
  2019-06-12 15:10                   ` Borislav Petkov
@ 2019-06-12 16:52                     ` Lendacky, Thomas
  2019-06-12 18:07                       ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Lendacky, Thomas @ 2019-06-12 16:52 UTC (permalink / raw)
  To: Borislav Petkov, Baoquan He
  Cc: lijiang, linux-kernel, kexec, tglx, mingo, akpm, dave.hansen,
	luto, peterz, x86, hpa, dyoung

On 6/12/19 10:10 AM, Borislav Petkov wrote:
> On Wed, Jun 12, 2019 at 09:55:49AM +0800, Baoquan He wrote:
>> With further investigation, the failure after applying Tom's patch is
>> caused by OOM. When increase crashkernel reservation to 512M, kdump
>> kernel can boot successfully. I noticed your crashkernel reservation is
>> 256M, that will fail and stuck there very possibly.
>>
>> So Tom's patch can fix the issue. We need further check why much more
>> crashkernel memory is needed on those AMD boxes with sme support..
> 
> Yes, 256M for a kexec kernel sounds pretty much enough to me. So there's
> something else at play here. I wonder if that workarea after _end, from
> Tom's patch, needs so much room...

I think the discussion ended up being that debuginfo wasn't being stripped
from the kernel and initrd (mainly the initrd).  What are the sizes of
the kernel and initrd that you are loading for kdump via kexec?

From previous post:
  kexec -s -p /boot/vmlinuz-5.2.0-rc3+ --initrd=/boot/initrd.img-5.2.0-rc3+

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table
  2019-06-12 16:52                     ` Lendacky, Thomas
@ 2019-06-12 18:07                       ` Borislav Petkov
  2019-06-12 19:10                         ` Lendacky, Thomas
  2019-06-13  1:18                         ` dyoung
  0 siblings, 2 replies; 25+ messages in thread
From: Borislav Petkov @ 2019-06-12 18:07 UTC (permalink / raw)
  To: Lendacky, Thomas
  Cc: Baoquan He, lijiang, linux-kernel, kexec, tglx, mingo, akpm,
	dave.hansen, luto, peterz, x86, hpa, dyoung

On Wed, Jun 12, 2019 at 04:52:22PM +0000, Lendacky, Thomas wrote:
> I think the discussion ended up being that debuginfo wasn't being stripped
> from the kernel and initrd (mainly the initrd).  What are the sizes of
> the kernel and initrd that you are loading for kdump via kexec?
> 
> From previous post:
>   kexec -s -p /boot/vmlinuz-5.2.0-rc3+ --initrd=/boot/initrd.img-5.2.0-rc3+

You mean those sizes?

$ ls -lh /boot/vmlinuz-5.2.0-rc3+ /boot/initrd.img-5.2.0-rc3+
-rw-r--r-- 1 root root 7.8M Jun 10 12:53 /boot/initrd.img-5.2.0-rc3+
-rw-r--r-- 1 root root 6.7M Jun 10 12:53 /boot/vmlinuz-5.2.0-rc3+

That should fit easily in 256M :)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table
  2019-06-12 18:07                       ` Borislav Petkov
@ 2019-06-12 19:10                         ` Lendacky, Thomas
  2019-06-13 15:07                           ` Baoquan He
  2019-06-13  1:18                         ` dyoung
  1 sibling, 1 reply; 25+ messages in thread
From: Lendacky, Thomas @ 2019-06-12 19:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Baoquan He, lijiang, linux-kernel, kexec, tglx, mingo, akpm,
	dave.hansen, luto, peterz, x86, hpa, dyoung

On 6/12/19 1:07 PM, Borislav Petkov wrote:
> On Wed, Jun 12, 2019 at 04:52:22PM +0000, Lendacky, Thomas wrote:
>> I think the discussion ended up being that debuginfo wasn't being stripped
>> from the kernel and initrd (mainly the initrd).  What are the sizes of
>> the kernel and initrd that you are loading for kdump via kexec?
>>
>> From previous post:
>>   kexec -s -p /boot/vmlinuz-5.2.0-rc3+ --initrd=/boot/initrd.img-5.2.0-rc3+
> 
> You mean those sizes?
> 
> $ ls -lh /boot/vmlinuz-5.2.0-rc3+ /boot/initrd.img-5.2.0-rc3+
> -rw-r--r-- 1 root root 7.8M Jun 10 12:53 /boot/initrd.img-5.2.0-rc3+
> -rw-r--r-- 1 root root 6.7M Jun 10 12:53 /boot/vmlinuz-5.2.0-rc3+
> 
> That should fit easily in 256M :)

Certainly seems like they should. I know there are other things that are
loaded, but that should be plenty of room. I wonder if Baoquan or Lianbo
could track where things are being loaded to see if everything is being
calculated and placed properly.

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table
  2019-06-12 18:07                       ` Borislav Petkov
  2019-06-12 19:10                         ` Lendacky, Thomas
@ 2019-06-13  1:18                         ` dyoung
  1 sibling, 0 replies; 25+ messages in thread
From: dyoung @ 2019-06-13  1:18 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Lendacky, Thomas, Baoquan He, lijiang, linux-kernel, kexec, tglx,
	mingo, akpm, dave.hansen, luto, peterz, x86, hpa

On 06/12/19 at 08:07pm, Borislav Petkov wrote:
> On Wed, Jun 12, 2019 at 04:52:22PM +0000, Lendacky, Thomas wrote:
> > I think the discussion ended up being that debuginfo wasn't being stripped
> > from the kernel and initrd (mainly the initrd).  What are the sizes of
> > the kernel and initrd that you are loading for kdump via kexec?
> > 
> > From previous post:
> >   kexec -s -p /boot/vmlinuz-5.2.0-rc3+ --initrd=/boot/initrd.img-5.2.0-rc3+
> 
> You mean those sizes?
> 
> $ ls -lh /boot/vmlinuz-5.2.0-rc3+ /boot/initrd.img-5.2.0-rc3+
> -rw-r--r-- 1 root root 7.8M Jun 10 12:53 /boot/initrd.img-5.2.0-rc3+
> -rw-r--r-- 1 root root 6.7M Jun 10 12:53 /boot/vmlinuz-5.2.0-rc3+
> 
> That should fit easily in 256M :)

The final used size is uncompressed size, for example in my case:

$ ls -lth arch/x86/boot/bzImage 
-rw-rw-r-- 1 dyoung dyoung 6.3M May 24 11:19 arch/x86/boot/bzImage
$ ls -lth arch/x86/boot/compressed/vmlinux.bin
-rwxrwxr-x 1 dyoung dyoung 25M May 24 11:19 arch/x86/boot/compressed/vmlinux.bin

The vmlinuz is 6.3M, uncompressed kernel is about 25M, since yours
bzImage is 7.8M, I would expect the final size is around 29M

for initramfs, you can check it by:

$ ls -lth /boot/initramfs-5.0.9-301.fc30.x86_64kdump.img
-rw------- 1 root root 16M May 28 08:59 /boot/initramfs-5.0.9-301.fc30.x86_64kdump.img
$ mkdir tmp
$ cd tmp
$ sudo lsinitrd --unpack /boot/initramfs-5.0.9-301.fc30.x86_64kdump.img
$ du -hs .
46M	.

You can see my kdump initrd is 46M after unpacking.

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.

Thanks
Dave

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table
  2019-06-12 19:10                         ` Lendacky, Thomas
@ 2019-06-13 15:07                           ` Baoquan He
  0 siblings, 0 replies; 25+ messages in thread
From: Baoquan He @ 2019-06-13 15:07 UTC (permalink / raw)
  To: Borislav Petkov, Lendacky, Thomas
  Cc: lijiang, linux-kernel, kexec, tglx, mingo, akpm, dave.hansen,
	luto, peterz, x86, hpa, dyoung

On 06/12/19 at 07:10pm, Lendacky, Thomas wrote:
> On 6/12/19 1:07 PM, Borislav Petkov wrote:
> > On Wed, Jun 12, 2019 at 04:52:22PM +0000, Lendacky, Thomas wrote:
> >> I think the discussion ended up being that debuginfo wasn't being stripped
> >> from the kernel and initrd (mainly the initrd).  What are the sizes of
> >> the kernel and initrd that you are loading for kdump via kexec?
> >>
> >> From previous post:
> >>   kexec -s -p /boot/vmlinuz-5.2.0-rc3+ --initrd=/boot/initrd.img-5.2.0-rc3+
> > 
> > You mean those sizes?
> > 
> > $ ls -lh /boot/vmlinuz-5.2.0-rc3+ /boot/initrd.img-5.2.0-rc3+
> > -rw-r--r-- 1 root root 7.8M Jun 10 12:53 /boot/initrd.img-5.2.0-rc3+
> > -rw-r--r-- 1 root root 6.7M Jun 10 12:53 /boot/vmlinuz-5.2.0-rc3+
> > 
> > That should fit easily in 256M :)
> 
> Certainly seems like they should. I know there are other things that are
> loaded, but that should be plenty of room. I wonder if Baoquan or Lianbo
> could track where things are being loaded to see if everything is being
> calculated and placed properly.

Today I did some investigations on speedway and another customer's
machine with sme support. 

In kdump kernel boot log, we can see that it prints the memory usage as
below from mem_init_print_info() of mem_init(). There it free all
memblock memory into buddy. We can see kernel used (144828K reserved)
before this, about 144M. This is for sure, and I got the same value form
memblock=debug kernel parameter adding.

[    2.109408] Kernel command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.2.0-rc4+ ro mem_encrypt=on resume=/dev/mapper/rhel_amd--speedway--05-swap console=ttyS0,115200 earlyprintk=serial,0x6000,115200 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never nr_cpus=1 debug nokaslr disable_cpu_apicid=0 elfcorehdr=1899892K
[    2.155433] Memory: 65572K/262128K available (12292K kernel code, 2047K rwdata, 3840K rodata, 2344K init, 6360K bss, 144828K reserved, 0K cma-reserved)

The free memory in buddy is 65572K, about 65M. This confuses me. I added
below code to print the free memory, it's about 64M. It seems not
changed. I need read code and check further.

[    5.775595] bhe: free:0x10304
[    5.778612] Mem-Info:
[    5.780923] active_anon:1818 inactive_anon:12837 isolated_anon:0
[    5.780923]  active_file:0 inactive_file:0 isolated_file:0
[    5.780923]  unevictable:0 dirty:0 writeback:0 unstable:0
[    5.780923]  slab_reclaimable:1995 slab_unreclaimable:3347
[    5.780923]  mapped:0 shmem:14662 pagetables:1 bounce:0
[    5.780923]  free:16577 free_pcp:3 free_cma:0

--- a/init/main.c
+++ b/init/main.c
@@ -1168,6 +1168,8 @@ static noinline void __init kernel_init_freeable(void)
 
        do_basic_setup();
 
+       pr_info("bhe: free:0x%lx\n", nr_free_pages() << (PAGE_SHIFT - 10));
+       show_mem(0, NULL);

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Above is about code debugging and analysis. From testing results, there
are things impacting the memory usage of kdump kernel. 

1)We need strip DEBUG INFO from kernel modules, otherwise it will bloat
the initrd and its space.

2)And some machines will consume more memory than other, because they own
more pci devices or different devices and drivers. Before init process
run, we will detect and init them, these will eat memory.

With my testing, the speedway machine which has 128 cpus obvisouly consume
more memory than one HP machine. On the HP machine, even 160M
crashkernel memory with DEBUG INFO stripped, kdump kernel can work well.
While 160M crashkernel doesn't satisfy speedway machine, it needs 256M.

3)Some extra kernel parameters may impact memory usage. E.g in Boris's
test, 'log_buf_len=16M' and 'debug' are added, this will cost extra
memory.

kexec -s -p /boot/vmlinuz-5.2.0-rc3+ --initrd=/boot/initrd.img-5.2.0-rc3+ --command-line="maxcpus=1 root=/dev/sda5 ro debug ignore_loglevel
log_buf_len=16M no_console_suspend net.ifnames=0 systemd.log_target=null mem_encrypt=on kvm_amd.sev=1 nr_cpus=1 irqpoll reset_devices vga=normal
LANG=en_US.UTF-8 earlyprintk=serial cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug
transparent_hugepage=never disable_cpu_apicid=0"

Anyway, I will continue investigating, see if I can get exact
information from kernel printing or debugging.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [tip:x86/kdump] x86/e820, ioport: Add a new I/O resource descriptor IORES_DESC_RESERVED
  2019-04-23  1:30 ` [PATCH 1/3 v11] x86/e820, resource: add a new I/O resource descriptor 'IORES_DESC_RESERVED' Lianbo Jiang
@ 2019-06-20  9:59   ` tip-bot for Lianbo Jiang
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Lianbo Jiang @ 2019-06-20  9:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bp, rppt, mingo, huang.zijiang, n-horiguchi, x86, linux-kernel,
	m.mizuma, jgross, mhocko, peterz, joe, luto, thomas.lendacky,
	tglx, lijiang, hpa, mingo, akpm

Commit-ID:  ae9e13d621d6795ec1ad6bf10bd2549c6c3feca4
Gitweb:     https://git.kernel.org/tip/ae9e13d621d6795ec1ad6bf10bd2549c6c3feca4
Author:     Lianbo Jiang <lijiang@redhat.com>
AuthorDate: Tue, 23 Apr 2019 09:30:05 +0800
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Thu, 20 Jun 2019 09:54:31 +0200

x86/e820, ioport: Add a new I/O resource descriptor IORES_DESC_RESERVED

When executing the kexec_file_load() syscall, the first kernel needs to
pass the e820 reserved ranges to the second kernel because some devices
(PCI, for example) need them present in the kdump kernel for proper
initialization.

But the kernel can not exactly match the e820 reserved ranges when
walking through the iomem resources using the default IORES_DESC_NONE
descriptor, because there are several types of e820 ranges which are
marked IORES_DESC_NONE, see e820_type_to_iores_desc().

Therefore, add a new I/O resource descriptor called IORES_DESC_RESERVED
to mark exactly those ranges. It will be used to match the reserved
resource ranges when walking through iomem resources.

 [ bp: Massage commit message. ]

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: bhe@redhat.com
Cc: dave.hansen@linux.intel.com
Cc: dyoung@redhat.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huang Zijiang <huang.zijiang@zte.com.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kexec@lists.infradead.org
Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190423013007.17838-2-lijiang@redhat.com
---
 arch/x86/kernel/e820.c | 2 +-
 include/linux/ioport.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 8f32e705a980..e69408bf664b 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1063,10 +1063,10 @@ static unsigned long __init e820_type_to_iores_desc(struct e820_entry *entry)
 	case E820_TYPE_NVS:		return IORES_DESC_ACPI_NV_STORAGE;
 	case E820_TYPE_PMEM:		return IORES_DESC_PERSISTENT_MEMORY;
 	case E820_TYPE_PRAM:		return IORES_DESC_PERSISTENT_MEMORY_LEGACY;
+	case E820_TYPE_RESERVED:	return IORES_DESC_RESERVED;
 	case E820_TYPE_RESERVED_KERN:	/* Fall-through: */
 	case E820_TYPE_RAM:		/* Fall-through: */
 	case E820_TYPE_UNUSABLE:	/* Fall-through: */
-	case E820_TYPE_RESERVED:	/* Fall-through: */
 	default:			return IORES_DESC_NONE;
 	}
 }
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..6ed59de48bd5 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -133,6 +133,7 @@ enum {
 	IORES_DESC_PERSISTENT_MEMORY_LEGACY	= 5,
 	IORES_DESC_DEVICE_PRIVATE_MEMORY	= 6,
 	IORES_DESC_DEVICE_PUBLIC_MEMORY		= 7,
+	IORES_DESC_RESERVED			= 8,
 };
 
 /* helpers to define resources */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [tip:x86/kdump] x86/mm: Rework ioremap resource mapping determination
  2019-04-23  1:30 ` [PATCH 2/3 v11] x86/mm: change the check condition in SEV because a new descriptor is introduced Lianbo Jiang
@ 2019-06-20 10:00   ` tip-bot for Lianbo Jiang
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Lianbo Jiang @ 2019-06-20 10:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: lijiang, linux-kernel, peterz, thomas.lendacky, mingo, tglx, x86,
	mingo, akpm, dave.hansen, bp, hpa, luto

Commit-ID:  5da04cc86d1215fd9fe0e5c88ead6e8428a75e56
Gitweb:     https://git.kernel.org/tip/5da04cc86d1215fd9fe0e5c88ead6e8428a75e56
Author:     Lianbo Jiang <lijiang@redhat.com>
AuthorDate: Tue, 23 Apr 2019 09:30:06 +0800
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Thu, 20 Jun 2019 09:58:07 +0200

x86/mm: Rework ioremap resource mapping determination

On ioremap(), __ioremap_check_mem() does a couple of checks on the
supplied memory range to determine how the range should be mapped and in
particular what protection flags should be used.

Generalize the procedure by introducing IORES_MAP_* flags which control
different aspects of the ioremapping and use them in the respective
helpers which determine which descriptor flags should be set per range.

 [ bp:
   - Rewrite commit message.
   - Add/improve comments.
   - Reflow __ioremap_caller()'s args.
   - s/__ioremap_check_desc/__ioremap_check_encrypted/g;
   - s/__ioremap_res_check/__ioremap_collect_map_flags/g;
   - clarify __ioremap_check_ram()'s purpose. ]

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Co-developed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: bhe@redhat.com
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: dyoung@redhat.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kexec@lists.infradead.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190423013007.17838-3-lijiang@redhat.com
---
 arch/x86/mm/ioremap.c  | 71 ++++++++++++++++++++++++++++++++------------------
 include/linux/ioport.h |  9 +++++++
 2 files changed, 54 insertions(+), 26 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 4b6423e7bd21..e500f1df1140 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -28,9 +28,11 @@
 
 #include "physaddr.h"
 
-struct ioremap_mem_flags {
-	bool system_ram;
-	bool desc_other;
+/*
+ * Descriptor controlling ioremap() behavior.
+ */
+struct ioremap_desc {
+	unsigned int flags;
 };
 
 /*
@@ -62,13 +64,14 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long size,
 	return err;
 }
 
-static bool __ioremap_check_ram(struct resource *res)
+/* Does the range (or a subset of) contain normal RAM? */
+static unsigned int __ioremap_check_ram(struct resource *res)
 {
 	unsigned long start_pfn, stop_pfn;
 	unsigned long i;
 
 	if ((res->flags & IORESOURCE_SYSTEM_RAM) != IORESOURCE_SYSTEM_RAM)
-		return false;
+		return 0;
 
 	start_pfn = (res->start + PAGE_SIZE - 1) >> PAGE_SHIFT;
 	stop_pfn = (res->end + 1) >> PAGE_SHIFT;
@@ -76,28 +79,44 @@ static bool __ioremap_check_ram(struct resource *res)
 		for (i = 0; i < (stop_pfn - start_pfn); ++i)
 			if (pfn_valid(start_pfn + i) &&
 			    !PageReserved(pfn_to_page(start_pfn + i)))
-				return true;
+				return IORES_MAP_SYSTEM_RAM;
 	}
 
-	return false;
+	return 0;
 }
 
-static int __ioremap_check_desc_other(struct resource *res)
+/*
+ * In a SEV guest, NONE and RESERVED should not be mapped encrypted because
+ * there the whole memory is already encrypted.
+ */
+static unsigned int __ioremap_check_encrypted(struct resource *res)
 {
-	return (res->desc != IORES_DESC_NONE);
+	if (!sev_active())
+		return 0;
+
+	switch (res->desc) {
+	case IORES_DESC_NONE:
+	case IORES_DESC_RESERVED:
+		break;
+	default:
+		return IORES_MAP_ENCRYPTED;
+	}
+
+	return 0;
 }
 
-static int __ioremap_res_check(struct resource *res, void *arg)
+static int __ioremap_collect_map_flags(struct resource *res, void *arg)
 {
-	struct ioremap_mem_flags *flags = arg;
+	struct ioremap_desc *desc = arg;
 
-	if (!flags->system_ram)
-		flags->system_ram = __ioremap_check_ram(res);
+	if (!(desc->flags & IORES_MAP_SYSTEM_RAM))
+		desc->flags |= __ioremap_check_ram(res);
 
-	if (!flags->desc_other)
-		flags->desc_other = __ioremap_check_desc_other(res);
+	if (!(desc->flags & IORES_MAP_ENCRYPTED))
+		desc->flags |= __ioremap_check_encrypted(res);
 
-	return flags->system_ram && flags->desc_other;
+	return ((desc->flags & (IORES_MAP_SYSTEM_RAM | IORES_MAP_ENCRYPTED)) ==
+			       (IORES_MAP_SYSTEM_RAM | IORES_MAP_ENCRYPTED));
 }
 
 /*
@@ -106,15 +125,15 @@ static int __ioremap_res_check(struct resource *res, void *arg)
  * resource described not as IORES_DESC_NONE (e.g. IORES_DESC_ACPI_TABLES).
  */
 static void __ioremap_check_mem(resource_size_t addr, unsigned long size,
-				struct ioremap_mem_flags *flags)
+				struct ioremap_desc *desc)
 {
 	u64 start, end;
 
 	start = (u64)addr;
 	end = start + size - 1;
-	memset(flags, 0, sizeof(*flags));
+	memset(desc, 0, sizeof(struct ioremap_desc));
 
-	walk_mem_res(start, end, flags, __ioremap_res_check);
+	walk_mem_res(start, end, desc, __ioremap_collect_map_flags);
 }
 
 /*
@@ -131,15 +150,15 @@ static void __ioremap_check_mem(resource_size_t addr, unsigned long size,
  * have to convert them into an offset in a page-aligned mapping, but the
  * caller shouldn't need to know that small detail.
  */
-static void __iomem *__ioremap_caller(resource_size_t phys_addr,
-		unsigned long size, enum page_cache_mode pcm,
-		void *caller, bool encrypted)
+static void __iomem *
+__ioremap_caller(resource_size_t phys_addr, unsigned long size,
+		 enum page_cache_mode pcm, void *caller, bool encrypted)
 {
 	unsigned long offset, vaddr;
 	resource_size_t last_addr;
 	const resource_size_t unaligned_phys_addr = phys_addr;
 	const unsigned long unaligned_size = size;
-	struct ioremap_mem_flags mem_flags;
+	struct ioremap_desc io_desc;
 	struct vm_struct *area;
 	enum page_cache_mode new_pcm;
 	pgprot_t prot;
@@ -158,12 +177,12 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
 		return NULL;
 	}
 
-	__ioremap_check_mem(phys_addr, size, &mem_flags);
+	__ioremap_check_mem(phys_addr, size, &io_desc);
 
 	/*
 	 * Don't allow anybody to remap normal RAM that we're using..
 	 */
-	if (mem_flags.system_ram) {
+	if (io_desc.flags & IORES_MAP_SYSTEM_RAM) {
 		WARN_ONCE(1, "ioremap on RAM at %pa - %pa\n",
 			  &phys_addr, &last_addr);
 		return NULL;
@@ -201,7 +220,7 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
 	 * resulting mapping.
 	 */
 	prot = PAGE_KERNEL_IO;
-	if ((sev_active() && mem_flags.desc_other) || encrypted)
+	if ((io_desc.flags & IORES_MAP_ENCRYPTED) || encrypted)
 		prot = pgprot_encrypted(prot);
 
 	switch (pcm) {
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6ed59de48bd5..5db386cfc2d4 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -12,6 +12,7 @@
 #ifndef __ASSEMBLY__
 #include <linux/compiler.h>
 #include <linux/types.h>
+#include <linux/bits.h>
 /*
  * Resources are tree-like, allowing
  * nesting etc..
@@ -136,6 +137,14 @@ enum {
 	IORES_DESC_RESERVED			= 8,
 };
 
+/*
+ * Flags controlling ioremap() behavior.
+ */
+enum {
+	IORES_MAP_SYSTEM_RAM		= BIT(0),
+	IORES_MAP_ENCRYPTED		= BIT(1),
+};
+
 /* helpers to define resources */
 #define DEFINE_RES_NAMED(_start, _size, _name, _flags)			\
 	{								\

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [tip:x86/kdump] x86/crash: Add e820 reserved ranges to kdump kernel's e820 table
  2019-04-23  1:30 ` [PATCH 3/3 v11] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table Lianbo Jiang
@ 2019-06-20 10:01   ` tip-bot for Lianbo Jiang
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot for Lianbo Jiang @ 2019-06-20 10:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bjorn.helgaas, linux-kernel, wang.yi59, hpa, luto, lijiang,
	mingo, bp, bhe, akpm, thomas.lendacky, dyoung, mingo, gustavo,
	x86, peterz, tglx

Commit-ID:  980621daf368f2b9aa69c7ea01baa654edb7577b
Gitweb:     https://git.kernel.org/tip/980621daf368f2b9aa69c7ea01baa654edb7577b
Author:     Lianbo Jiang <lijiang@redhat.com>
AuthorDate: Tue, 23 Apr 2019 09:30:07 +0800
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Thu, 20 Jun 2019 10:05:06 +0200

x86/crash: Add e820 reserved ranges to kdump kernel's e820 table

At present, when using the kexec_file_load() syscall to load the kernel
image and initramfs, for example:

  kexec -s -p xxx

the kernel does not pass the e820 reserved ranges to the second kernel,
which might cause two problems:

 1. MMCONFIG: A device in PCI segment 1 cannot be discovered by the
kernel PCI probing without all the e820 I/O reservations being present
in the e820 table. Which is the case currently, because the kdump kernel
does not have those reservations because the kexec command does not pass
the I/O reservation via the "memmap=xxx" command line option.

Further details courtesy of Bjorn Helgaas¹: I think you should regard
correct MCFG/ECAM usage in the kdump kernel as a requirement. MMCONFIG
(aka ECAM) space is described in the ACPI MCFG table. If you don't have
ECAM:

  (a) PCI devices won't work at all on non-x86 systems that use only
   ECAM for config access,

  (b) you won't be able to access devices on non-0 segments (granted,
  there aren't very many of these yet, but there will be more in the
  future), and

  (c) you won't be able to access extended config space (addresses
  0x100-0xfff), which means none of the Extended Capabilities will be
  available (AER, ACS, ATS, etc).

 2. The second issue is that the SME kdump kernel doesn't work without
the e820 reserved ranges. When SME is active in the kdump kernel, those
reserved regions are still decrypted, but because those reserved ranges
are not present at all in kdump kernel's e820 table, they are accessed
as encrypted. Which is obviously wrong.

 [1]: https://lkml.kernel.org/r/CABhMZUUscS3jUZUSM5Y6EYJK6weo7Mjj5-EAKGvbw0qEe%2B38zw@mail.gmail.com

 [ bp: Heavily massage commit message. ]

Suggested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Baoquan He <bhe@redhat.com>
Cc: Bjorn Helgaas <bjorn.helgaas@gmail.com>
Cc: dave.hansen@linux.intel.com
Cc: Dave Young <dyoung@redhat.com>
Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kexec@lists.infradead.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Cc: Yi Wang <wang.yi59@zte.com.cn>
Link: https://lkml.kernel.org/r/20190423013007.17838-4-lijiang@redhat.com
---
 arch/x86/kernel/crash.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 576b2e1bfc12..32c956705b8e 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -381,6 +381,12 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
 	walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, &cmd,
 			memmap_entry_callback);
 
+	/* Add e820 reserved ranges */
+	cmd.type = E820_TYPE_RESERVED;
+	flags = IORESOURCE_MEM;
+	walk_iomem_res_desc(IORES_DESC_RESERVED, flags, 0, -1, &cmd,
+			   memmap_entry_callback);
+
 	/* Add crashk_low_res region */
 	if (crashk_low_res.end) {
 		ei.addr = crashk_low_res.start;

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2019-06-20 10:01 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-23  1:30 [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table Lianbo Jiang
2019-04-23  1:30 ` [PATCH 1/3 v11] x86/e820, resource: add a new I/O resource descriptor 'IORES_DESC_RESERVED' Lianbo Jiang
2019-06-20  9:59   ` [tip:x86/kdump] x86/e820, ioport: Add a new I/O resource descriptor IORES_DESC_RESERVED tip-bot for Lianbo Jiang
2019-04-23  1:30 ` [PATCH 2/3 v11] x86/mm: change the check condition in SEV because a new descriptor is introduced Lianbo Jiang
2019-06-20 10:00   ` [tip:x86/kdump] x86/mm: Rework ioremap resource mapping determination tip-bot for Lianbo Jiang
2019-04-23  1:30 ` [PATCH 3/3 v11] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table Lianbo Jiang
2019-06-20 10:01   ` [tip:x86/kdump] x86/crash: Add e820 reserved ranges to kdump kernel's " tip-bot for Lianbo Jiang
2019-05-28  7:30 ` [PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel " lijiang
2019-06-07 17:42   ` Borislav Petkov
2019-06-08  3:54     ` Baoquan He
2019-06-08  9:10       ` Borislav Petkov
2019-06-08 10:01         ` Baoquan He
2019-06-08 10:06           ` Borislav Petkov
2019-06-08 10:26             ` Baoquan He
2019-06-10 11:37               ` Borislav Petkov
2019-06-12  1:14                 ` lijiang
2019-06-12  1:55                 ` Baoquan He
2019-06-12  5:49                   ` Dave Young
2019-06-12 15:10                   ` Borislav Petkov
2019-06-12 16:52                     ` Lendacky, Thomas
2019-06-12 18:07                       ` Borislav Petkov
2019-06-12 19:10                         ` Lendacky, Thomas
2019-06-13 15:07                           ` Baoquan He
2019-06-13  1:18                         ` dyoung
2019-06-09  4:02     ` lijiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).