LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/6] hv: Remove dependencies on guest page size
@ 2019-04-05  1:11 Maya Nakamura
  2019-04-05  1:13 ` [PATCH 1/6] x86: hv: hyperv-tlfs.h: Create and use Hyper-V page definitions Maya Nakamura
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Maya Nakamura @ 2019-04-05  1:11 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel

The Linux guest page size and hypervisor page size concepts are
different, even though they happen to be the same value on x86. Hyper-V
code mixes up the two, so this patchset begins to address that by
creating and using a set of Hyper-V specific page definitions.

A major benefit of those new definitions is that they support non-x86
architectures, such as ARM64, that use different page sizes. On ARM64,
the guest page size may not be 4096, and Hyper-V always runs with a page
size of 4096.

In this patchset, the first two patches lay the foundation for the
others, creating definitions and preparing for memory allocations of
Hyper-V pages. Subsequent patches apply the definitions where the guest
VM and Hyper-V communicate, and where the code intends to use the
Hyper-V page size. The last two patches set the ring buffer size to a
fixed value, removing the dependencies on the guest page size.

This is the initial set of changes to the Hyper-V code, and future
patches will make additional changes using the same foundation, for
example, replace __vmalloc() and related functions when Hyper-V pages
are intended.

Maya Nakamura (6):
  x86: hv: hyperv-tlfs.h: Create and use Hyper-V page definitions
  x86: hv: hv_init.c: Replace alloc_page() with kmem_cache_alloc()
  hv: vmbus: Replace page definition with Hyper-V specific one
  x86: hv: mmu.c: Replace page definitions with Hyper-V specific ones
  HID: hv: Remove dependencies on PAGE_SIZE for ring buffer
  Input: hv: Remove dependencies on PAGE_SIZE for ring buffer

 arch/x86/hyperv/hv_init.c             | 38 +++++++++++++++++++--------
 arch/x86/hyperv/mmu.c                 | 15 ++++++-----
 arch/x86/include/asm/hyperv-tlfs.h    | 12 ++++++++-
 drivers/hid/hid-hyperv.c              |  4 +--
 drivers/hv/hyperv_vmbus.h             |  8 +++---
 drivers/input/serio/hyperv-keyboard.c |  4 +--
 6 files changed, 54 insertions(+), 27 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/6] x86: hv: hyperv-tlfs.h: Create and use Hyper-V page definitions
  2019-04-05  1:11 [PATCH 0/6] hv: Remove dependencies on guest page size Maya Nakamura
@ 2019-04-05  1:13 ` Maya Nakamura
  2019-04-05  1:14 ` [PATCH 2/6] x86: hv: hv_init.c: Replace alloc_page() with kmem_cache_alloc() Maya Nakamura
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Maya Nakamura @ 2019-04-05  1:13 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel

Define HV_HYP_PAGE_SHIFT, HV_HYP_PAGE_SIZE, and HV_HYP_PAGE_MASK because
the Linux guest page size and hypervisor page size concepts are
different, even though they happen to be the same value on x86.

Also, replace PAGE_SIZE with HV_HYP_PAGE_SIZE.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
---
 arch/x86/include/asm/hyperv-tlfs.h | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index cdf44aa9a501..44bd68aefd00 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -12,6 +12,16 @@
 #include <linux/types.h>
 #include <asm/page.h>
 
+/*
+ * While not explicitly listed in the TLFS, Hyper-V always runs with a page size
+ * of 4096. These definitions are used when communicating with Hyper-V using
+ * guest physical pages and guest physical page addresses, since the guest page
+ * size may not be 4096 on all architectures.
+ */
+#define HV_HYP_PAGE_SHIFT	12
+#define HV_HYP_PAGE_SIZE	BIT(HV_HYP_PAGE_SHIFT)
+#define HV_HYP_PAGE_MASK	(~(HV_HYP_PAGE_SIZE - 1))
+
 /*
  * The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent
  * is set by CPUID(HvCpuIdFunctionVersionAndFeatures).
@@ -841,7 +851,7 @@ union hv_gpa_page_range {
  * count is equal with how many entries of union hv_gpa_page_range can
  * be populated into the input parameter page.
  */
-#define HV_MAX_FLUSH_REP_COUNT ((PAGE_SIZE - 2 * sizeof(u64)) /	\
+#define HV_MAX_FLUSH_REP_COUNT ((HV_HYP_PAGE_SIZE - 2 * sizeof(u64)) /	\
 				sizeof(union hv_gpa_page_range))
 
 struct hv_guest_mapping_flush_list {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/6] x86: hv: hv_init.c: Replace alloc_page() with kmem_cache_alloc()
  2019-04-05  1:11 [PATCH 0/6] hv: Remove dependencies on guest page size Maya Nakamura
  2019-04-05  1:13 ` [PATCH 1/6] x86: hv: hyperv-tlfs.h: Create and use Hyper-V page definitions Maya Nakamura
@ 2019-04-05  1:14 ` Maya Nakamura
  2019-04-05 11:31   ` Vitaly Kuznetsov
  2019-04-05  1:16 ` [PATCH 3/6] hv: vmbus: Replace page definition with Hyper-V specific one Maya Nakamura
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Maya Nakamura @ 2019-04-05  1:14 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel

Switch from the function that allocates a single Linux guest page to a
different one to use a Hyper-V page because the guest page size and
hypervisor page size concepts are different, even though they happen to
be the same value on x86.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
---
 arch/x86/hyperv/hv_init.c | 38 +++++++++++++++++++++++++++-----------
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index e4ba467a9fc6..5f946135aa18 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -31,6 +31,7 @@
 #include <linux/hyperv.h>
 #include <linux/slab.h>
 #include <linux/cpuhotplug.h>
+#include <asm/set_memory.h>
 
 #ifdef CONFIG_HYPERV_TSCPAGE
 
@@ -98,18 +99,20 @@ EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
 u32 hv_max_vp_index;
 EXPORT_SYMBOL_GPL(hv_max_vp_index);
 
+struct kmem_cache *cachep;
+EXPORT_SYMBOL_GPL(cachep);
+
 static int hv_cpu_init(unsigned int cpu)
 {
 	u64 msr_vp_index;
 	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
 	void **input_arg;
-	struct page *pg;
 
 	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
-	pg = alloc_page(GFP_KERNEL);
-	if (unlikely(!pg))
+	*input_arg = kmem_cache_alloc(cachep, GFP_KERNEL);
+
+	if (unlikely(!*input_arg))
 		return -ENOMEM;
-	*input_arg = page_address(pg);
 
 	hv_get_vp_index(msr_vp_index);
 
@@ -122,14 +125,12 @@ static int hv_cpu_init(unsigned int cpu)
 		return 0;
 
 	if (!*hvp)
-		*hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);
+		*hvp = kmem_cache_alloc(cachep, GFP_KERNEL);
 
 	if (*hvp) {
 		u64 val;
 
-		val = vmalloc_to_pfn(*hvp);
-		val = (val << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) |
-			HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
+		val = virt_to_phys(*hvp) | HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
 
 		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, val);
 	}
@@ -233,17 +234,22 @@ static int hv_cpu_die(unsigned int cpu)
 	unsigned long flags;
 	void **input_arg;
 	void *input_pg = NULL;
+	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
 
 	local_irq_save(flags);
 	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
 	input_pg = *input_arg;
 	*input_arg = NULL;
 	local_irq_restore(flags);
-	free_page((unsigned long)input_pg);
+	kmem_cache_free(cachep, input_pg);
+	input_pg = NULL;
 
 	if (hv_vp_assist_page && hv_vp_assist_page[cpu])
 		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
 
+	kmem_cache_free(cachep, *hvp);
+	*hvp = NULL;
+
 	if (hv_reenlightenment_cb == NULL)
 		return 0;
 
@@ -325,6 +331,11 @@ void __init hyperv_init(void)
 		goto free_vp_index;
 	}
 
+	cachep = kmem_cache_create("hyperv_pages", HV_HYP_PAGE_SIZE,
+				   HV_HYP_PAGE_SIZE, 0, NULL);
+	if (!cachep)
+		goto free_vp_assist_page;
+
 	cpuhp = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
 				  hv_cpu_init, hv_cpu_die);
 	if (cpuhp < 0)
@@ -338,7 +349,10 @@ void __init hyperv_init(void)
 	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
 
-	hv_hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL_RX);
+	hv_hypercall_pg = kmem_cache_alloc(cachep, GFP_KERNEL);
+	if (hv_hypercall_pg)
+		set_memory_x((unsigned long)hv_hypercall_pg, 1);
+
 	if (hv_hypercall_pg == NULL) {
 		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
 		goto remove_cpuhp_state;
@@ -346,7 +360,8 @@ void __init hyperv_init(void)
 
 	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 	hypercall_msr.enable = 1;
-	hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
+	hypercall_msr.guest_physical_address = virt_to_phys(hv_hypercall_pg) >>
+		HV_HYP_PAGE_SHIFT;
 	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 
 	hv_apic_init();
@@ -416,6 +431,7 @@ void hyperv_cleanup(void)
 	 * let hypercall operations fail safely rather than
 	 * panic the kernel for using invalid hypercall page
 	 */
+	kmem_cache_free(cachep, hv_hypercall_pg);
 	hv_hypercall_pg = NULL;
 
 	/* Reset the hypercall page */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 3/6] hv: vmbus: Replace page definition with Hyper-V specific one
  2019-04-05  1:11 [PATCH 0/6] hv: Remove dependencies on guest page size Maya Nakamura
  2019-04-05  1:13 ` [PATCH 1/6] x86: hv: hyperv-tlfs.h: Create and use Hyper-V page definitions Maya Nakamura
  2019-04-05  1:14 ` [PATCH 2/6] x86: hv: hv_init.c: Replace alloc_page() with kmem_cache_alloc() Maya Nakamura
@ 2019-04-05  1:16 ` Maya Nakamura
  2019-04-05  1:17 ` [PATCH 4/6] x86: hv: mmu.c: Replace page definitions with Hyper-V specific ones Maya Nakamura
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Maya Nakamura @ 2019-04-05  1:16 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel

Replace PAGE_SIZE with HV_HYP_PAGE_SIZE because the guest page size may
not be 4096 on all architectures and Hyper-V always runs with a page
size of 4096.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
---
 drivers/hv/hyperv_vmbus.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index a94aab94e0b5..00ad573870e9 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -207,11 +207,11 @@ int hv_ringbuffer_read(struct vmbus_channel *channel,
 		       u64 *requestid, bool raw);
 
 /*
- * Maximum channels is determined by the size of the interrupt page
- * which is PAGE_SIZE. 1/2 of PAGE_SIZE is for send endpoint interrupt
- * and the other is receive endpoint interrupt
+ * Maximum channels, 16348, is determined by the size of the interrupt page,
+ * which is HV_HYP_PAGE_SIZE. 1/2 of HV_HYP_PAGE_SIZE is to send endpoint
+ * interrupt, and the other is to receive endpoint interrupt.
  */
-#define MAX_NUM_CHANNELS	((PAGE_SIZE >> 1) << 3)	/* 16348 channels */
+#define MAX_NUM_CHANNELS	((HV_HYP_PAGE_SIZE >> 1) << 3)
 
 /* The value here must be in multiple of 32 */
 /* TODO: Need to make this configurable */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 4/6] x86: hv: mmu.c: Replace page definitions with Hyper-V specific ones
  2019-04-05  1:11 [PATCH 0/6] hv: Remove dependencies on guest page size Maya Nakamura
                   ` (2 preceding siblings ...)
  2019-04-05  1:16 ` [PATCH 3/6] hv: vmbus: Replace page definition with Hyper-V specific one Maya Nakamura
@ 2019-04-05  1:17 ` Maya Nakamura
  2019-04-05 11:10   ` Vitaly Kuznetsov
  2019-04-05  1:19 ` [PATCH 5/6] HID: hv: Remove dependencies on PAGE_SIZE for ring buffer Maya Nakamura
  2019-04-05  1:20 ` [PATCH 6/6] Input: " Maya Nakamura
  5 siblings, 1 reply; 17+ messages in thread
From: Maya Nakamura @ 2019-04-05  1:17 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel

Replace PAGE_SHIFT, PAGE_SIZE, and PAGE_MASK with HV_HYP_PAGE_SHIFT,
HV_HYP_PAGE_SIZE, and HV_HYP_PAGE_MASK, respectively, because the guest
page size and hypervisor page size concepts are different, even though
they happen to be the same value on x86.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
---
 arch/x86/hyperv/mmu.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index e65d7fe6489f..175f6dcc7362 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -15,7 +15,7 @@
 #include <asm/trace/hyperv.h>
 
 /* Each gva in gva_list encodes up to 4096 pages to flush */
-#define HV_TLB_FLUSH_UNIT (4096 * PAGE_SIZE)
+#define HV_TLB_FLUSH_UNIT (4096 * HV_HYP_PAGE_SIZE)
 
 static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
 				      const struct flush_tlb_info *info);
@@ -32,15 +32,15 @@ static inline int fill_gva_list(u64 gva_list[], int offset,
 	do {
 		diff = end > cur ? end - cur : 0;
 
-		gva_list[gva_n] = cur & PAGE_MASK;
+		gva_list[gva_n] = cur & HV_HYP_PAGE_MASK;
 		/*
 		 * Lower 12 bits encode the number of additional
 		 * pages to flush (in addition to the 'cur' page).
 		 */
 		if (diff >= HV_TLB_FLUSH_UNIT)
-			gva_list[gva_n] |= ~PAGE_MASK;
+			gva_list[gva_n] |= ~HV_HYP_PAGE_MASK;
 		else if (diff)
-			gva_list[gva_n] |= (diff - 1) >> PAGE_SHIFT;
+			gva_list[gva_n] |= (diff - 1) >> HV_HYP_PAGE_SHIFT;
 
 		cur += HV_TLB_FLUSH_UNIT;
 		gva_n++;
@@ -129,7 +129,8 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
 	 * We can flush not more than max_gvas with one hypercall. Flush the
 	 * whole address space if we were asked to do more.
 	 */
-	max_gvas = (PAGE_SIZE - sizeof(*flush)) / sizeof(flush->gva_list[0]);
+	max_gvas = (HV_HYP_PAGE_SIZE - sizeof(*flush)) /
+		    sizeof(flush->gva_list[0]);
 
 	if (info->end == TLB_FLUSH_ALL) {
 		flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
@@ -200,9 +201,9 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
 	 * whole address space if we were asked to do more.
 	 */
 	max_gvas =
-		(PAGE_SIZE - sizeof(*flush) - nr_bank *
+		(HV_HYP_PAGE_SIZE - sizeof(*flush) - nr_bank *
 		 sizeof(flush->hv_vp_set.bank_contents[0])) /
-		sizeof(flush->gva_list[0]);
+		 sizeof(flush->gva_list[0]);
 
 	if (info->end == TLB_FLUSH_ALL) {
 		flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 5/6] HID: hv: Remove dependencies on PAGE_SIZE for ring buffer
  2019-04-05  1:11 [PATCH 0/6] hv: Remove dependencies on guest page size Maya Nakamura
                   ` (3 preceding siblings ...)
  2019-04-05  1:17 ` [PATCH 4/6] x86: hv: mmu.c: Replace page definitions with Hyper-V specific ones Maya Nakamura
@ 2019-04-05  1:19 ` Maya Nakamura
  2019-04-05  1:20 ` [PATCH 6/6] Input: " Maya Nakamura
  5 siblings, 0 replies; 17+ messages in thread
From: Maya Nakamura @ 2019-04-05  1:19 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel

Define the ring buffer size as a constant expression because it should
not depend on the guest page size.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
---
 drivers/hid/hid-hyperv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hid/hid-hyperv.c b/drivers/hid/hid-hyperv.c
index 704049e62d58..4d0b63bf8b33 100644
--- a/drivers/hid/hid-hyperv.c
+++ b/drivers/hid/hid-hyperv.c
@@ -112,8 +112,8 @@ struct synthhid_input_report {
 
 #pragma pack(pop)
 
-#define INPUTVSC_SEND_RING_BUFFER_SIZE		(10*PAGE_SIZE)
-#define INPUTVSC_RECV_RING_BUFFER_SIZE		(10*PAGE_SIZE)
+#define INPUTVSC_SEND_RING_BUFFER_SIZE		(40 * 1024)
+#define INPUTVSC_RECV_RING_BUFFER_SIZE		(40 * 1024)
 
 
 enum pipe_prot_msg_type {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 6/6] Input: hv: Remove dependencies on PAGE_SIZE for ring buffer
  2019-04-05  1:11 [PATCH 0/6] hv: Remove dependencies on guest page size Maya Nakamura
                   ` (4 preceding siblings ...)
  2019-04-05  1:19 ` [PATCH 5/6] HID: hv: Remove dependencies on PAGE_SIZE for ring buffer Maya Nakamura
@ 2019-04-05  1:20 ` Maya Nakamura
  5 siblings, 0 replies; 17+ messages in thread
From: Maya Nakamura @ 2019-04-05  1:20 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel

Define the ring buffer size as a constant expression because it should
not depend on the guest page size.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
---
 drivers/input/serio/hyperv-keyboard.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/input/serio/hyperv-keyboard.c b/drivers/input/serio/hyperv-keyboard.c
index a8b9be3e28db..e1b0feeaeb91 100644
--- a/drivers/input/serio/hyperv-keyboard.c
+++ b/drivers/input/serio/hyperv-keyboard.c
@@ -83,8 +83,8 @@ struct synth_kbd_keystroke {
 
 #define HK_MAXIMUM_MESSAGE_SIZE 256
 
-#define KBD_VSC_SEND_RING_BUFFER_SIZE		(10 * PAGE_SIZE)
-#define KBD_VSC_RECV_RING_BUFFER_SIZE		(10 * PAGE_SIZE)
+#define KBD_VSC_SEND_RING_BUFFER_SIZE		(40 * 1024)
+#define KBD_VSC_RECV_RING_BUFFER_SIZE		(40 * 1024)
 
 #define XTKBD_EMUL0     0xe0
 #define XTKBD_EMUL1     0xe1
-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/6] x86: hv: mmu.c: Replace page definitions with Hyper-V specific ones
  2019-04-05  1:17 ` [PATCH 4/6] x86: hv: mmu.c: Replace page definitions with Hyper-V specific ones Maya Nakamura
@ 2019-04-05 11:10   ` Vitaly Kuznetsov
       [not found]     ` <DM5PR2101MB091843B6DD7A11C2C27917F1D7280@DM5PR2101MB0918.namprd21.prod.outlook.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Vitaly Kuznetsov @ 2019-04-05 11:10 UTC (permalink / raw)
  To: Maya Nakamura, mikelley, kys, haiyangz, sthemmin, sashal
  Cc: x86, linux-hyperv, linux-kernel

Maya Nakamura <m.maya.nakamura@gmail.com> writes:

> Replace PAGE_SHIFT, PAGE_SIZE, and PAGE_MASK with HV_HYP_PAGE_SHIFT,
> HV_HYP_PAGE_SIZE, and HV_HYP_PAGE_MASK, respectively, because the guest
> page size and hypervisor page size concepts are different, even though
> they happen to be the same value on x86.
>
> Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
> ---
>  arch/x86/hyperv/mmu.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> index e65d7fe6489f..175f6dcc7362 100644
> --- a/arch/x86/hyperv/mmu.c
> +++ b/arch/x86/hyperv/mmu.c
> @@ -15,7 +15,7 @@
>  #include <asm/trace/hyperv.h>
>  
>  /* Each gva in gva_list encodes up to 4096 pages to flush */
> -#define HV_TLB_FLUSH_UNIT (4096 * PAGE_SIZE)
> +#define HV_TLB_FLUSH_UNIT (4096 * HV_HYP_PAGE_SIZE)
>  
>  static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
>  				      const struct flush_tlb_info *info);
> @@ -32,15 +32,15 @@ static inline int fill_gva_list(u64 gva_list[], int offset,
>  	do {
>  		diff = end > cur ? end - cur : 0;
>  
> -		gva_list[gva_n] = cur & PAGE_MASK;
> +		gva_list[gva_n] = cur & HV_HYP_PAGE_MASK;

I'm not sure this is correct: here we're expressing guest virtual
addresses in need of flushing, this should be unrelated to the
hypervisor page size.

>  		/*
>  		 * Lower 12 bits encode the number of additional
>  		 * pages to flush (in addition to the 'cur' page).
>  		 */
>  		if (diff >= HV_TLB_FLUSH_UNIT)
> -			gva_list[gva_n] |= ~PAGE_MASK;
> +			gva_list[gva_n] |= ~HV_HYP_PAGE_MASK;
>  		else if (diff)
> -			gva_list[gva_n] |= (diff - 1) >> PAGE_SHIFT;
> +			gva_list[gva_n] |= (diff - 1) >> HV_HYP_PAGE_SHIFT;
>  
>  		cur += HV_TLB_FLUSH_UNIT;
>  		gva_n++;
> @@ -129,7 +129,8 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>  	 * We can flush not more than max_gvas with one hypercall. Flush the
>  	 * whole address space if we were asked to do more.
>  	 */
> -	max_gvas = (PAGE_SIZE - sizeof(*flush)) / sizeof(flush->gva_list[0]);
> +	max_gvas = (HV_HYP_PAGE_SIZE - sizeof(*flush)) /
> +		    sizeof(flush->gva_list[0]);
>  
>  	if (info->end == TLB_FLUSH_ALL) {
>  		flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
> @@ -200,9 +201,9 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
>  	 * whole address space if we were asked to do more.
>  	 */
>  	max_gvas =
> -		(PAGE_SIZE - sizeof(*flush) - nr_bank *
> +		(HV_HYP_PAGE_SIZE - sizeof(*flush) - nr_bank *
>  		 sizeof(flush->hv_vp_set.bank_contents[0])) /
> -		sizeof(flush->gva_list[0]);
> +		 sizeof(flush->gva_list[0]);
>  
>  	if (info->end == TLB_FLUSH_ALL) {
>  		flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;

-- 
Vitaly

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6] x86: hv: hv_init.c: Replace alloc_page() with kmem_cache_alloc()
  2019-04-05  1:14 ` [PATCH 2/6] x86: hv: hv_init.c: Replace alloc_page() with kmem_cache_alloc() Maya Nakamura
@ 2019-04-05 11:31   ` Vitaly Kuznetsov
  2019-04-12  7:24     ` Maya Nakamura
  0 siblings, 1 reply; 17+ messages in thread
From: Vitaly Kuznetsov @ 2019-04-05 11:31 UTC (permalink / raw)
  To: Maya Nakamura, mikelley, kys, haiyangz, sthemmin, sashal
  Cc: x86, linux-hyperv, linux-kernel

Maya Nakamura <m.maya.nakamura@gmail.com> writes:

> Switch from the function that allocates a single Linux guest page to a
> different one to use a Hyper-V page because the guest page size and
> hypervisor page size concepts are different, even though they happen to
> be the same value on x86.
>
> Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
> ---
>  arch/x86/hyperv/hv_init.c | 38 +++++++++++++++++++++++++++-----------
>  1 file changed, 27 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index e4ba467a9fc6..5f946135aa18 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -31,6 +31,7 @@
>  #include <linux/hyperv.h>
>  #include <linux/slab.h>
>  #include <linux/cpuhotplug.h>
> +#include <asm/set_memory.h>
>  
>  #ifdef CONFIG_HYPERV_TSCPAGE
>  
> @@ -98,18 +99,20 @@ EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
>  u32 hv_max_vp_index;
>  EXPORT_SYMBOL_GPL(hv_max_vp_index);
>  
> +struct kmem_cache *cachep;
> +EXPORT_SYMBOL_GPL(cachep);
> +
>  static int hv_cpu_init(unsigned int cpu)
>  {
>  	u64 msr_vp_index;
>  	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
>  	void **input_arg;
> -	struct page *pg;
>  
>  	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> -	pg = alloc_page(GFP_KERNEL);
> -	if (unlikely(!pg))
> +	*input_arg = kmem_cache_alloc(cachep, GFP_KERNEL);

I'm not sure use of kmem_cache is justified here: pages we allocate are
not cache-line and all these allocations are supposed to persist for the
lifetime of the guest. In case you think that even on x86 it will be
possible to see PAGE_SIZE != HV_HYP_PAGE_SIZE you can use alloc_pages()
instead.

Also, in case the idea is to generalize stuff, what will happen if
PAGE_SIZE > HV_HYP_PAGE_SIZE? Who will guarantee proper alignment?

I think we can leave hypercall arguments, vp_assist and similar pages
alone for now: the code is not going to be shared among architectures
anyways.

> +
> +	if (unlikely(!*input_arg))
>  		return -ENOMEM;
> -	*input_arg = page_address(pg);
>  
>  	hv_get_vp_index(msr_vp_index);
>  
> @@ -122,14 +125,12 @@ static int hv_cpu_init(unsigned int cpu)
>  		return 0;
>  
>  	if (!*hvp)
> -		*hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);
> +		*hvp = kmem_cache_alloc(cachep, GFP_KERNEL);
>  
>  	if (*hvp) {
>  		u64 val;
>  
> -		val = vmalloc_to_pfn(*hvp);
> -		val = (val << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) |
> -			HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
> +		val = virt_to_phys(*hvp) | HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
>  
>  		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, val);
>  	}
> @@ -233,17 +234,22 @@ static int hv_cpu_die(unsigned int cpu)
>  	unsigned long flags;
>  	void **input_arg;
>  	void *input_pg = NULL;
> +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
>  
>  	local_irq_save(flags);
>  	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
>  	input_pg = *input_arg;
>  	*input_arg = NULL;
>  	local_irq_restore(flags);
> -	free_page((unsigned long)input_pg);
> +	kmem_cache_free(cachep, input_pg);
> +	input_pg = NULL;
>  
>  	if (hv_vp_assist_page && hv_vp_assist_page[cpu])
>  		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
>  
> +	kmem_cache_free(cachep, *hvp);
> +	*hvp = NULL;
> +
>  	if (hv_reenlightenment_cb == NULL)
>  		return 0;
>  
> @@ -325,6 +331,11 @@ void __init hyperv_init(void)
>  		goto free_vp_index;
>  	}
>  
> +	cachep = kmem_cache_create("hyperv_pages", HV_HYP_PAGE_SIZE,
> +				   HV_HYP_PAGE_SIZE, 0, NULL);
> +	if (!cachep)
> +		goto free_vp_assist_page;
> +
>  	cpuhp = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
>  				  hv_cpu_init, hv_cpu_die);
>  	if (cpuhp < 0)
> @@ -338,7 +349,10 @@ void __init hyperv_init(void)
>  	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
>  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
>  
> -	hv_hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL_RX);
> +	hv_hypercall_pg = kmem_cache_alloc(cachep, GFP_KERNEL);
> +	if (hv_hypercall_pg)
> +		set_memory_x((unsigned long)hv_hypercall_pg, 1);

_RX is not writeable, right?

> +
>  	if (hv_hypercall_pg == NULL) {
>  		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
>  		goto remove_cpuhp_state;
> @@ -346,7 +360,8 @@ void __init hyperv_init(void)
>  
>  	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>  	hypercall_msr.enable = 1;
> -	hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
> +	hypercall_msr.guest_physical_address = virt_to_phys(hv_hypercall_pg) >>
> +		HV_HYP_PAGE_SHIFT;
>  	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>  
>  	hv_apic_init();
> @@ -416,6 +431,7 @@ void hyperv_cleanup(void)
>  	 * let hypercall operations fail safely rather than
>  	 * panic the kernel for using invalid hypercall page
>  	 */
> +	kmem_cache_free(cachep, hv_hypercall_pg);

Please don't do that: hyperv_cleanup() is called on kexec/kdump and
we're trying to do the bare minimum to allow next kernel to boot. Doing
excessive work here will likely lead to consequent problems (we're
already crashing the case it's kdump!).

>  	hv_hypercall_pg = NULL;
>  
>  	/* Reset the hypercall page */

-- 
Vitaly

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 4/6] x86: hv: mmu.c: Replace page definitions with Hyper-V specific ones
       [not found]     ` <DM5PR2101MB091843B6DD7A11C2C27917F1D7280@DM5PR2101MB0918.namprd21.prod.outlook.com>
@ 2019-04-12  6:58       ` Vitaly Kuznetsov
  0 siblings, 0 replies; 17+ messages in thread
From: Vitaly Kuznetsov @ 2019-04-12  6:58 UTC (permalink / raw)
  To: Michael Kelley, m.maya.nakamura
  Cc: x86, linux-hyperv, linux-kernel, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, sashal

Michael Kelley <mikelley@microsoft.com> writes:

> From: Vitaly Kuznetsov <vkuznets@redhat.com>  Sent: Friday, April 5, 2019 4:11 AM
>> >
>> > @@ -32,15 +32,15 @@ static inline int fill_gva_list(u64 gva_list[], int offset,
>> >  	do {
>> >  		diff = end > cur ? end - cur : 0;
>> >
>> > -		gva_list[gva_n] = cur & PAGE_MASK;
>> > +		gva_list[gva_n] = cur & HV_HYP_PAGE_MASK;
>> 
>> I'm not sure this is correct: here we're expressing guest virtual
>> addresses in need of flushing, this should be unrelated to the
>> hypervisor page size.
>> 
>
> I talked to the Hyper-V guys about this.  They have not implemented
> the HvFlushVirtualAddressList hypercalls for ARM64 yet.  But when they
> do, they expect the GVA list will need to be in terms of the Hyper-V
> page size.  They don't want to have to figure out the guest page size
> and adjust their arithmetic on a per-guest basis.

Understood, thanks! However, what I wanted to say is that this code is
x86-specific (arch/x86/hyperv/mmu.c) and the implementation is
hard-wired to the spec, namely we use lower 12 bits (there's even a
comment in the code about this which the patch doesn't change) to encode
the number of pages to flush. We can speculate that when these
hypercalls are implemented on ARM they'll have similar requirements but
I'd suggest we leave it as-is for now and fix this when (and, actually,
if) we decide to move this to arch-independent part. We may need to do
additional adjustments - and we don't know about them yet because
there's no spec published.

-- 
Vitaly

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6] x86: hv: hv_init.c: Replace alloc_page() with kmem_cache_alloc()
  2019-04-05 11:31   ` Vitaly Kuznetsov
@ 2019-04-12  7:24     ` Maya Nakamura
  2019-04-12  7:52       ` Vitaly Kuznetsov
  0 siblings, 1 reply; 17+ messages in thread
From: Maya Nakamura @ 2019-04-12  7:24 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: mikelley, kys, haiyangz, sthemmin, sashal, x86, linux-hyperv,
	linux-kernel

On Fri, Apr 05, 2019 at 01:31:02PM +0200, Vitaly Kuznetsov wrote:
> Maya Nakamura <m.maya.nakamura@gmail.com> writes:
> 
> > @@ -98,18 +99,20 @@ EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
> >  u32 hv_max_vp_index;
> >  EXPORT_SYMBOL_GPL(hv_max_vp_index);
> >  
> > +struct kmem_cache *cachep;
> > +EXPORT_SYMBOL_GPL(cachep);
> > +
> >  static int hv_cpu_init(unsigned int cpu)
> >  {
> >  	u64 msr_vp_index;
> >  	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
> >  	void **input_arg;
> > -	struct page *pg;
> >  
> >  	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> > -	pg = alloc_page(GFP_KERNEL);
> > -	if (unlikely(!pg))
> > +	*input_arg = kmem_cache_alloc(cachep, GFP_KERNEL);
> 
> I'm not sure use of kmem_cache is justified here: pages we allocate are
> not cache-line and all these allocations are supposed to persist for the
> lifetime of the guest. In case you think that even on x86 it will be
> possible to see PAGE_SIZE != HV_HYP_PAGE_SIZE you can use alloc_pages()
> instead.
> 
Thank you for your feedback, Vitaly!

Will you please tell me how cache-line relates to kmem_cache?

I understand that alloc_pages() would work when PAGE_SIZE <=
HV_HYP_PAGE_SIZE, but I think that it would not work if PAGE_SIZE >
HV_HYP_PAGE_SIZE.

> Also, in case the idea is to generalize stuff, what will happen if
> PAGE_SIZE > HV_HYP_PAGE_SIZE? Who will guarantee proper alignment?
> 
> I think we can leave hypercall arguments, vp_assist and similar pages
> alone for now: the code is not going to be shared among architectures
> anyways.
> 
About the alignment, kmem_cache_create() aligns memory with its third
parameter, offset.

> > @@ -338,7 +349,10 @@ void __init hyperv_init(void)
> >  	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
> >  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
> >  
> > -	hv_hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL_RX);
> > +	hv_hypercall_pg = kmem_cache_alloc(cachep, GFP_KERNEL);
> > +	if (hv_hypercall_pg)
> > +		set_memory_x((unsigned long)hv_hypercall_pg, 1);
> 
> _RX is not writeable, right?
> 
Yes, you are correct. I should use set_memory_ro() in addition to
set_memory_x().

> > @@ -416,6 +431,7 @@ void hyperv_cleanup(void)
> >  	 * let hypercall operations fail safely rather than
> >  	 * panic the kernel for using invalid hypercall page
> >  	 */
> > +	kmem_cache_free(cachep, hv_hypercall_pg);
> 
> Please don't do that: hyperv_cleanup() is called on kexec/kdump and
> we're trying to do the bare minimum to allow next kernel to boot. Doing
> excessive work here will likely lead to consequent problems (we're
> already crashing the case it's kdump!).
> 
Thank you for the explanation! I will remove that.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6] x86: hv: hv_init.c: Replace alloc_page() with kmem_cache_alloc()
  2019-04-12  7:24     ` Maya Nakamura
@ 2019-04-12  7:52       ` Vitaly Kuznetsov
  2019-05-08  6:46         ` Maya Nakamura
  0 siblings, 1 reply; 17+ messages in thread
From: Vitaly Kuznetsov @ 2019-04-12  7:52 UTC (permalink / raw)
  To: Maya Nakamura
  Cc: mikelley, kys, haiyangz, sthemmin, sashal, x86, linux-hyperv,
	linux-kernel

Maya Nakamura <m.maya.nakamura@gmail.com> writes:

> On Fri, Apr 05, 2019 at 01:31:02PM +0200, Vitaly Kuznetsov wrote:
>> Maya Nakamura <m.maya.nakamura@gmail.com> writes:
>> 
>> > @@ -98,18 +99,20 @@ EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
>> >  u32 hv_max_vp_index;
>> >  EXPORT_SYMBOL_GPL(hv_max_vp_index);
>> >  
>> > +struct kmem_cache *cachep;
>> > +EXPORT_SYMBOL_GPL(cachep);
>> > +
>> >  static int hv_cpu_init(unsigned int cpu)
>> >  {
>> >  	u64 msr_vp_index;
>> >  	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
>> >  	void **input_arg;
>> > -	struct page *pg;
>> >  
>> >  	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
>> > -	pg = alloc_page(GFP_KERNEL);
>> > -	if (unlikely(!pg))
>> > +	*input_arg = kmem_cache_alloc(cachep, GFP_KERNEL);
>> 
>> I'm not sure use of kmem_cache is justified here: pages we allocate are
>> not cache-line and all these allocations are supposed to persist for the
>> lifetime of the guest. In case you think that even on x86 it will be
>> possible to see PAGE_SIZE != HV_HYP_PAGE_SIZE you can use alloc_pages()
>> instead.
>> 
> Thank you for your feedback, Vitaly!
>
> Will you please tell me how cache-line relates to kmem_cache?
>
> I understand that alloc_pages() would work when PAGE_SIZE <=
> HV_HYP_PAGE_SIZE, but I think that it would not work if PAGE_SIZE >
> HV_HYP_PAGE_SIZE.

Sorry, my bad: I meant to say "not cache-like" (these allocations are
not 'cache') but the typo made it completely incomprehensible. 

>
>> Also, in case the idea is to generalize stuff, what will happen if
>> PAGE_SIZE > HV_HYP_PAGE_SIZE? Who will guarantee proper alignment?
>> 
>> I think we can leave hypercall arguments, vp_assist and similar pages
>> alone for now: the code is not going to be shared among architectures
>> anyways.
>> 
> About the alignment, kmem_cache_create() aligns memory with its third
> parameter, offset.

Yes, I know, I was trying to think about a (hypothetical) situation when
page sizes differ: what would be the memory alignment requirements from
the hypervisor for e.g. hypercall arguments? In case it's always
HV_HYP_PAGE_SIZE we're good but could it be PAGE_SIZE (for e.g. TLB
flush hypercall)? I don't know. For x86 this discussion probably makes
no sense. I'm, however, struggling to understand what benefit we will
get from the change. Maybe just leave it as-is for now and fix
arch-independent code only? And later, if we decide to generalize this
code, make another approach? (Not insisting, just a suggestion)

>
>> > @@ -338,7 +349,10 @@ void __init hyperv_init(void)
>> >  	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
>> >  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
>> >  
>> > -	hv_hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL_RX);
>> > +	hv_hypercall_pg = kmem_cache_alloc(cachep, GFP_KERNEL);
>> > +	if (hv_hypercall_pg)
>> > +		set_memory_x((unsigned long)hv_hypercall_pg, 1);
>> 
>> _RX is not writeable, right?
>> 
> Yes, you are correct. I should use set_memory_ro() in addition to
> set_memory_x().
>
>> > @@ -416,6 +431,7 @@ void hyperv_cleanup(void)
>> >  	 * let hypercall operations fail safely rather than
>> >  	 * panic the kernel for using invalid hypercall page
>> >  	 */
>> > +	kmem_cache_free(cachep, hv_hypercall_pg);
>> 
>> Please don't do that: hyperv_cleanup() is called on kexec/kdump and
>> we're trying to do the bare minimum to allow next kernel to boot. Doing
>> excessive work here will likely lead to consequent problems (we're
>> already crashing the case it's kdump!).
>> 
> Thank you for the explanation! I will remove that.
>

-- 
Vitaly

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6] x86: hv: hv_init.c: Replace alloc_page() with kmem_cache_alloc()
  2019-04-12  7:52       ` Vitaly Kuznetsov
@ 2019-05-08  6:46         ` Maya Nakamura
  2019-05-08 14:54           ` Vitaly Kuznetsov
  0 siblings, 1 reply; 17+ messages in thread
From: Maya Nakamura @ 2019-05-08  6:46 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: mikelley, kys, haiyangz, sthemmin, sashal, x86, linux-hyperv,
	linux-kernel

On Fri, Apr 12, 2019 at 09:52:47AM +0200, Vitaly Kuznetsov wrote:
> Maya Nakamura <m.maya.nakamura@gmail.com> writes:
> 
> > On Fri, Apr 05, 2019 at 01:31:02PM +0200, Vitaly Kuznetsov wrote:
> >> Maya Nakamura <m.maya.nakamura@gmail.com> writes:
> >> 
> >> > @@ -98,18 +99,20 @@ EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
> >> >  u32 hv_max_vp_index;
> >> >  EXPORT_SYMBOL_GPL(hv_max_vp_index);
> >> >  
> >> > +struct kmem_cache *cachep;
> >> > +EXPORT_SYMBOL_GPL(cachep);
> >> > +
> >> >  static int hv_cpu_init(unsigned int cpu)
> >> >  {
> >> >  	u64 msr_vp_index;
> >> >  	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
> >> >  	void **input_arg;
> >> > -	struct page *pg;
> >> >  
> >> >  	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> >> > -	pg = alloc_page(GFP_KERNEL);
> >> > -	if (unlikely(!pg))
> >> > +	*input_arg = kmem_cache_alloc(cachep, GFP_KERNEL);
> >> 
> >> I'm not sure use of kmem_cache is justified here: pages we allocate are
> >> not cache-line and all these allocations are supposed to persist for the
> >> lifetime of the guest. In case you think that even on x86 it will be
> >> possible to see PAGE_SIZE != HV_HYP_PAGE_SIZE you can use alloc_pages()
> >> instead.
> >> 
> > Thank you for your feedback, Vitaly!
> >
> > Will you please tell me how cache-line relates to kmem_cache?
> >
> > I understand that alloc_pages() would work when PAGE_SIZE <=
> > HV_HYP_PAGE_SIZE, but I think that it would not work if PAGE_SIZE >
> > HV_HYP_PAGE_SIZE.
> 
> Sorry, my bad: I meant to say "not cache-like" (these allocations are
> not 'cache') but the typo made it completely incomprehensible. 
 
No worries! Thank you for sharing your thoughts with me, Vitaly.

Do you know of any alternatives to kmem_cache that can allocate memory
in a specified size (different than a guest page size) with alignment?
Memory allocated by alloc_page() is aligned but limited to the guest
page size, and kmalloc() can allocate a particular size but it seems
that it does not guarantee alignment. I am asking this while considering
the changes for architecture independent code.

> >> Also, in case the idea is to generalize stuff, what will happen if
> >> PAGE_SIZE > HV_HYP_PAGE_SIZE? Who will guarantee proper alignment?
> >> 
> >> I think we can leave hypercall arguments, vp_assist and similar pages
> >> alone for now: the code is not going to be shared among architectures
> >> anyways.
> >> 
> > About the alignment, kmem_cache_create() aligns memory with its third
> > parameter, offset.
> 
> Yes, I know, I was trying to think about a (hypothetical) situation when
> page sizes differ: what would be the memory alignment requirements from
> the hypervisor for e.g. hypercall arguments? In case it's always
> HV_HYP_PAGE_SIZE we're good but could it be PAGE_SIZE (for e.g. TLB
> flush hypercall)? I don't know. For x86 this discussion probably makes
> no sense. I'm, however, struggling to understand what benefit we will
> get from the change. Maybe just leave it as-is for now and fix
> arch-independent code only? And later, if we decide to generalize this
> code, make another approach? (Not insisting, just a suggestion)

Thank you for the suggestion, Vitaly!

The introduction of HV_HYP_PAGE_SIZE is weighing the assumption of the
future page size—it can be bigger based on the general trend, not
smaller, which is a reasonable assumption, I think.

> >> > @@ -338,7 +349,10 @@ void __init hyperv_init(void)
> >> >  	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
> >> >  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
> >> >  
> >> > -	hv_hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL_RX);
> >> > +	hv_hypercall_pg = kmem_cache_alloc(cachep, GFP_KERNEL);
> >> > +	if (hv_hypercall_pg)
> >> > +		set_memory_x((unsigned long)hv_hypercall_pg, 1);
> >> 
> >> _RX is not writeable, right?
> >> 
> > Yes, you are correct. I should use set_memory_ro() in addition to
> > set_memory_x().
> >
> >> > @@ -416,6 +431,7 @@ void hyperv_cleanup(void)
> >> >  	 * let hypercall operations fail safely rather than
> >> >  	 * panic the kernel for using invalid hypercall page
> >> >  	 */
> >> > +	kmem_cache_free(cachep, hv_hypercall_pg);
> >> 
> >> Please don't do that: hyperv_cleanup() is called on kexec/kdump and
> >> we're trying to do the bare minimum to allow next kernel to boot. Doing
> >> excessive work here will likely lead to consequent problems (we're
> >> already crashing the case it's kdump!).
> >> 
> > Thank you for the explanation! I will remove that.
> >
> 
> -- 
> Vitaly

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6] x86: hv: hv_init.c: Replace alloc_page() with kmem_cache_alloc()
  2019-05-08  6:46         ` Maya Nakamura
@ 2019-05-08 14:54           ` Vitaly Kuznetsov
       [not found]             ` <BYAPR21MB1317AC7CA4B242106FCAD698CC320@BYAPR21MB1317.namprd21.prod.outlook.com>
       [not found]             ` <MN2PR21MB1232C6ABA5DAC847C8A910E1D70C0@MN2PR21MB1232.namprd21.prod.outlook.com>
  0 siblings, 2 replies; 17+ messages in thread
From: Vitaly Kuznetsov @ 2019-05-08 14:54 UTC (permalink / raw)
  To: Maya Nakamura
  Cc: mikelley, kys, haiyangz, sthemmin, sashal, x86, linux-hyperv,
	linux-kernel

Maya Nakamura <m.maya.nakamura@gmail.com> writes:

> On Fri, Apr 12, 2019 at 09:52:47AM +0200, Vitaly Kuznetsov wrote:
>> Maya Nakamura <m.maya.nakamura@gmail.com> writes:
>> 
>> > On Fri, Apr 05, 2019 at 01:31:02PM +0200, Vitaly Kuznetsov wrote:
>> >> Maya Nakamura <m.maya.nakamura@gmail.com> writes:
>> >> 
>> >> > @@ -98,18 +99,20 @@ EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
>> >> >  u32 hv_max_vp_index;
>> >> >  EXPORT_SYMBOL_GPL(hv_max_vp_index);
>> >> >  
>> >> > +struct kmem_cache *cachep;
>> >> > +EXPORT_SYMBOL_GPL(cachep);
>> >> > +
>> >> >  static int hv_cpu_init(unsigned int cpu)
>> >> >  {
>> >> >  	u64 msr_vp_index;
>> >> >  	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
>> >> >  	void **input_arg;
>> >> > -	struct page *pg;
>> >> >  
>> >> >  	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
>> >> > -	pg = alloc_page(GFP_KERNEL);
>> >> > -	if (unlikely(!pg))
>> >> > +	*input_arg = kmem_cache_alloc(cachep, GFP_KERNEL);
>> >> 
>> >> I'm not sure use of kmem_cache is justified here: pages we allocate are
>> >> not cache-line and all these allocations are supposed to persist for the
>> >> lifetime of the guest. In case you think that even on x86 it will be
>> >> possible to see PAGE_SIZE != HV_HYP_PAGE_SIZE you can use alloc_pages()
>> >> instead.
>> >> 
>> > Thank you for your feedback, Vitaly!
>> >
>> > Will you please tell me how cache-line relates to kmem_cache?
>> >
>> > I understand that alloc_pages() would work when PAGE_SIZE <=
>> > HV_HYP_PAGE_SIZE, but I think that it would not work if PAGE_SIZE >
>> > HV_HYP_PAGE_SIZE.
>> 
>> Sorry, my bad: I meant to say "not cache-like" (these allocations are
>> not 'cache') but the typo made it completely incomprehensible. 
>  
> No worries! Thank you for sharing your thoughts with me, Vitaly.
>
> Do you know of any alternatives to kmem_cache that can allocate memory
> in a specified size (different than a guest page size) with alignment?
> Memory allocated by alloc_page() is aligned but limited to the guest
> page size, and kmalloc() can allocate a particular size but it seems
> that it does not guarantee alignment. I am asking this while considering
> the changes for architecture independent code.
>

I think we can consider these allocations being DMA-like (because
Hypervisor accesses this memory too) so you can probably take a look at
dma_pool_create()/dma_pool_alloc() and friends.

>> >> Also, in case the idea is to generalize stuff, what will happen if
>> >> PAGE_SIZE > HV_HYP_PAGE_SIZE? Who will guarantee proper alignment?
>> >> 
>> >> I think we can leave hypercall arguments, vp_assist and similar pages
>> >> alone for now: the code is not going to be shared among architectures
>> >> anyways.
>> >> 
>> > About the alignment, kmem_cache_create() aligns memory with its third
>> > parameter, offset.
>> 
>> Yes, I know, I was trying to think about a (hypothetical) situation when
>> page sizes differ: what would be the memory alignment requirements from
>> the hypervisor for e.g. hypercall arguments? In case it's always
>> HV_HYP_PAGE_SIZE we're good but could it be PAGE_SIZE (for e.g. TLB
>> flush hypercall)? I don't know. For x86 this discussion probably makes
>> no sense. I'm, however, struggling to understand what benefit we will
>> get from the change. Maybe just leave it as-is for now and fix
>> arch-independent code only? And later, if we decide to generalize this
>> code, make another approach? (Not insisting, just a suggestion)
>
> Thank you for the suggestion, Vitaly!
>
> The introduction of HV_HYP_PAGE_SIZE is weighing the assumption of the
> future page size—it can be bigger based on the general trend, not
> smaller, which is a reasonable assumption, I think.
>

Let's spell it out (as BUILD_BUG_ON(HV_HYP_PAGE_SIZE < PAGE_SIZE) or
something like that) then to make it clear.

-- 
Vitaly

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6] x86: hv: hv_init.c: Replace alloc_page() with kmem_cache_alloc()
       [not found]             ` <BYAPR21MB1317AC7CA4B242106FCAD698CC320@BYAPR21MB1317.namprd21.prod.outlook.com>
@ 2019-05-08 19:53               ` Vitaly Kuznetsov
  0 siblings, 0 replies; 17+ messages in thread
From: Vitaly Kuznetsov @ 2019-05-08 19:53 UTC (permalink / raw)
  To: Stephen Hemminger, m.maya.nakamura
  Cc: Michael Kelley, KY Srinivasan, Haiyang Zhang, sashal, x86,
	linux-hyperv, linux-kernel

Stephen Hemminger <sthemmin@microsoft.com> writes:

> I would worry that kmem_cache_alloc does not currently have same alignment constraints.
> See discussion here:
> https://lwn.net/SubscriberLink/787740/a886fe4ea6681322/

I think it even was me who reported this bug with XFS originally :-) 

Yes, plain kmalloc() doesn't give you alignment guarantees (it is very
easy to prove, e.g. with CONFIG_KASAN), however, kmem_cache_create() (and
dma_pool_create() to that matter) has explicit 'align' parameter and it
is a bug if it is not respected.

-- 
Vitaly

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 2/6] x86: hv: hv_init.c: Replace alloc_page() with kmem_cache_alloc()
       [not found]             ` <MN2PR21MB1232C6ABA5DAC847C8A910E1D70C0@MN2PR21MB1232.namprd21.prod.outlook.com>
@ 2019-05-10 13:21               ` Vitaly Kuznetsov
       [not found]                 ` <BYAPR21MB1221962ED2DD7FEE19E7DAB6D70C0@BYAPR21MB1221.namprd21.prod.outlook.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Vitaly Kuznetsov @ 2019-05-10 13:21 UTC (permalink / raw)
  To: Michael Kelley, m.maya.nakamura
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, sashal, x86,
	linux-hyperv, linux-kernel

Michael Kelley <mikelley@microsoft.com> writes:

> From: Vitaly Kuznetsov <vkuznets@redhat.com>  Sent: Wednesday, May 8, 2019 7:55 AM
>> >>
>> >> Sorry, my bad: I meant to say "not cache-like" (these allocations are
>> >> not 'cache') but the typo made it completely incomprehensible.
>> >
>> > No worries! Thank you for sharing your thoughts with me, Vitaly.
>> >
>> > Do you know of any alternatives to kmem_cache that can allocate memory
>> > in a specified size (different than a guest page size) with alignment?
>> > Memory allocated by alloc_page() is aligned but limited to the guest
>> > page size, and kmalloc() can allocate a particular size but it seems
>> > that it does not guarantee alignment. I am asking this while considering
>> > the changes for architecture independent code.
>> >
>> 
>> I think we can consider these allocations being DMA-like (because
>> Hypervisor accesses this memory too) so you can probably take a look at
>> dma_pool_create()/dma_pool_alloc() and friends.
>> 
>
> I've taken a look at dma_pool_create(), and it takes a "struct device"
> argument with which the DMA pool will be associated.  That probably
> makes DMA pools a bad choice for this usage.  Pages need to be allocated
> pretty early during boot for Hyper-V communication, and even if the
> device subsystem is initialized early enough to create a fake device,
> such a dependency seems rather dubious.

We can probably use dma_pool_create()/dma_pool_alloc() from vmbus module
but these 'early' allocations may not have a device to bind to indeed.

>
> kmem_cache_create/alloc() seems like the only choice to get
> guaranteed alignment.  Do you see any actual problem with
> using kmem_cache_*, other than the naming?  It seems like these
> kmem_cache_*  functions really just act as a sub-allocator for
> known-size allocations, and "cache" is a common usage
> pattern, but not necessarily the only usage pattern.

Yes, it's basically the name - it makes it harder to read the code and
some future refactoring of kmem_cache_* may not take our use-case into
account (as we're misusing the API). We can try renaming it to something
generic of course and see what -mm people have to say :-)

-- 
Vitaly

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 2/6] x86: hv: hv_init.c: Replace alloc_page() with kmem_cache_alloc()
       [not found]                 ` <BYAPR21MB1221962ED2DD7FEE19E7DAB6D70C0@BYAPR21MB1221.namprd21.prod.outlook.com>
@ 2019-05-10 17:45                   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 17+ messages in thread
From: Vitaly Kuznetsov @ 2019-05-10 17:45 UTC (permalink / raw)
  To: Michael Kelley, m.maya.nakamura
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, sashal, x86,
	linux-hyperv, linux-kernel

Michael Kelley <mikelley@microsoft.com> writes:

> From: Vitaly Kuznetsov <vkuznets@redhat.com>  Sent: Friday, May 10, 2019 6:22 AM
>> >>
>> >> I think we can consider these allocations being DMA-like (because
>> >> Hypervisor accesses this memory too) so you can probably take a look at
>> >> dma_pool_create()/dma_pool_alloc() and friends.
>> >>
>> >
>> > I've taken a look at dma_pool_create(), and it takes a "struct device"
>> > argument with which the DMA pool will be associated.  That probably
>> > makes DMA pools a bad choice for this usage.  Pages need to be allocated
>> > pretty early during boot for Hyper-V communication, and even if the
>> > device subsystem is initialized early enough to create a fake device,
>> > such a dependency seems rather dubious.
>> 
>> We can probably use dma_pool_create()/dma_pool_alloc() from vmbus module
>> but these 'early' allocations may not have a device to bind to indeed.
>> 
>> >
>> > kmem_cache_create/alloc() seems like the only choice to get
>> > guaranteed alignment.  Do you see any actual problem with
>> > using kmem_cache_*, other than the naming?  It seems like these
>> > kmem_cache_*  functions really just act as a sub-allocator for
>> > known-size allocations, and "cache" is a common usage
>> > pattern, but not necessarily the only usage pattern.
>> 
>> Yes, it's basically the name - it makes it harder to read the code and
>> some future refactoring of kmem_cache_* may not take our use-case into
>> account (as we're misusing the API). We can try renaming it to something
>> generic of course and see what -mm people have to say :-)
>> 
>
> This makes me think of creating Hyper-V specific alloc/free functions
> that wrap whatever the backend allocator actually is.  So we have
> hv_alloc_hyperv_page() and hv_free_hyperv_page().  That makes the
> code very readable and the intent is super clear.
>
> As for the backend allocator, an alternative is to write our own simple
> allocator.  It maintains a single free list.  If hv_alloc_hyperv_page() is
> called, and the free list is empty, do alloc_page() and break it up into
> Hyper-V sized pages to replenish the free list.  (On x86, these end up
> being 1-for-1 operations.)  hv_free_hyperv_page() just puts the Hyper-V
> page back on the free list.  Don't worry trying to combine and do
> free_page() since there's very little free'ing done anyway.  And I'm
> assuming GPF_KERNEL is all we need.
>
> If in the future Linux provides an alternate general-purpose allocator
> that guarantees alignment, we can ditch the simple allocator and use
> the new mechanism with some simple code changes in one place.
>
> Thoughts?
>

+1 for adding wrappers and if the allocator turns out to be more-or-less
trivial I think we can live with that for the time being.

-- 
Vitaly

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-05-10 17:45 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-05  1:11 [PATCH 0/6] hv: Remove dependencies on guest page size Maya Nakamura
2019-04-05  1:13 ` [PATCH 1/6] x86: hv: hyperv-tlfs.h: Create and use Hyper-V page definitions Maya Nakamura
2019-04-05  1:14 ` [PATCH 2/6] x86: hv: hv_init.c: Replace alloc_page() with kmem_cache_alloc() Maya Nakamura
2019-04-05 11:31   ` Vitaly Kuznetsov
2019-04-12  7:24     ` Maya Nakamura
2019-04-12  7:52       ` Vitaly Kuznetsov
2019-05-08  6:46         ` Maya Nakamura
2019-05-08 14:54           ` Vitaly Kuznetsov
     [not found]             ` <BYAPR21MB1317AC7CA4B242106FCAD698CC320@BYAPR21MB1317.namprd21.prod.outlook.com>
2019-05-08 19:53               ` Vitaly Kuznetsov
     [not found]             ` <MN2PR21MB1232C6ABA5DAC847C8A910E1D70C0@MN2PR21MB1232.namprd21.prod.outlook.com>
2019-05-10 13:21               ` Vitaly Kuznetsov
     [not found]                 ` <BYAPR21MB1221962ED2DD7FEE19E7DAB6D70C0@BYAPR21MB1221.namprd21.prod.outlook.com>
2019-05-10 17:45                   ` Vitaly Kuznetsov
2019-04-05  1:16 ` [PATCH 3/6] hv: vmbus: Replace page definition with Hyper-V specific one Maya Nakamura
2019-04-05  1:17 ` [PATCH 4/6] x86: hv: mmu.c: Replace page definitions with Hyper-V specific ones Maya Nakamura
2019-04-05 11:10   ` Vitaly Kuznetsov
     [not found]     ` <DM5PR2101MB091843B6DD7A11C2C27917F1D7280@DM5PR2101MB0918.namprd21.prod.outlook.com>
2019-04-12  6:58       ` Vitaly Kuznetsov
2019-04-05  1:19 ` [PATCH 5/6] HID: hv: Remove dependencies on PAGE_SIZE for ring buffer Maya Nakamura
2019-04-05  1:20 ` [PATCH 6/6] Input: " Maya Nakamura

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).