LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] vmw_balloon: fixing double free when batching mode is off
@ 2018-03-12 19:19 Nadav Amit
  2018-03-12 19:28 ` [PATCH v2] " Nadav Amit
  0 siblings, 1 reply; 19+ messages in thread
From: Nadav Amit @ 2018-03-12 19:19 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman
  Cc: Xavier Deguillard, VMware, Inc.,
	linux-kernel, Gil Kupfer, Oleksandr Natalenko, Nadav Amit,
	stable, Nadav Amit

From: Gil Kupfer <gilkup@gmail.com>

The balloon.page field is used for two different purposes if paging is
on or off. If batching is on, the field point to the page which is used
to communicate with with the hypervisor. If it is off, balloon.page
points to the page that is about to be (un)locked.

Unfortunately, this dual-purpose of the field introduced a bug: when the
balloon is popped (e.g., when the machine is reset or the balloon driver
is explicitly removed), the balloon driver frees, unconditionally, the
page that is held in balloon.page.  As a result, if batching is
disabled, this leads to double freeing the last page that is sent to the
hypervisor.

batching and pointer to current page otherwise. When the balloon is
popped (during reset or rmmode), it tries to free the communication
page. If batching is disabled, this leads to double free of the last
page sent to the backend.

The following error occurs during rmmod when kernel checkers are on, and
the balloon is not empty:

[   42.307653] ------------[ cut here ]------------
[   42.307657] Kernel BUG at ffffffffba1e4b28 [verbose debug info unavailable]
[   42.307720] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[   42.312512] Modules linked in: vmw_vsock_vmci_transport vsock ppdev joydev vmw_balloon(-) input_leds serio_raw vmw_vmci parport_pc shpchp parport i2c_piix4 nfit mac_hid autofs4 vmwgfx drm_kms_helper hid_generic syscopyarea sysfillrect usbhid sysimgblt fb_sys_fops hid ttm mptspi scsi_transport_spi ahci mptscsih drm psmouse vmxnet3 libahci mptbase pata_acpi
[   42.312766] CPU: 10 PID: 1527 Comm: rmmod Not tainted 4.12.0+ #5
[   42.312803] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2016
[   42.313042] task: ffff9bf9680f8000 task.stack: ffffbfefc1638000
[   42.313290] RIP: 0010:__free_pages+0x38/0x40
[   42.313510] RSP: 0018:ffffbfefc163be98 EFLAGS: 00010246
[   42.313731] RAX: 000000000000003e RBX: ffffffffc02b9720 RCX: 0000000000000006
[   42.313972] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9bf97e08e0a0
[   42.314201] RBP: ffffbfefc163be98 R08: 0000000000000000 R09: 0000000000000000
[   42.314435] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc02b97e4
[   42.314505] R13: ffffffffc02b9748 R14: ffffffffc02b9728 R15: 0000000000000200
[   42.314550] FS:  00007f3af5fec700(0000) GS:ffff9bf97e080000(0000) knlGS:0000000000000000
[   42.314599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   42.314635] CR2: 00007f44f6f4ab24 CR3: 00000003a7d12000 CR4: 00000000000006e0
[   42.314864] Call Trace:
[   42.315774]  vmballoon_pop+0x102/0x130 [vmw_balloon]
[   42.315816]  vmballoon_exit+0x42/0xd64 [vmw_balloon]
[   42.315853]  SyS_delete_module+0x1e2/0x250
[   42.315891]  entry_SYSCALL_64_fastpath+0x23/0xc2
[   42.315924] RIP: 0033:0x7f3af5b0e8e7
[   42.315949] RSP: 002b:00007fffe6ce0148 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[   42.315996] RAX: ffffffffffffffda RBX: 000055be676401e0 RCX: 00007f3af5b0e8e7
[   42.316951] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055be67640248
[   42.317887] RBP: 0000000000000003 R08: 0000000000000000 R09: 1999999999999999
[   42.318845] R10: 0000000000000883 R11: 0000000000000206 R12: 00007fffe6cdf130
[   42.319755] R13: 0000000000000000 R14: 0000000000000000 R15: 000055be676401e0
[   42.320606] Code: c0 74 1c f0 ff 4f 1c 74 02 5d c3 85 f6 74 07 e8 0f d8 ff ff 5d c3 31 f6 e8 c6 fb ff ff 5d c3 48 c7 c6 c8 0f c5 ba e8 58 be 02 00 <0f> 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 85 ff 75 01 c3 55 48
[   42.323462] RIP: __free_pages+0x38/0x40 RSP: ffffbfefc163be98
[   42.325735] ---[ end trace 872e008e33f81508 ]---

To solve the bug, we eliminate the dual purpose of balloon.page.

Fixes: 220a80f0c2e7 ("VMware balloon: add batching to the vmw_balloon.")
Cc: stable@vger.kernel.org
Reported-by: Oleksandr Natalenko <onatalen@redhat.com>
Signed-off-by: Gil Kupfer <gilkup@gmail.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
---
 drivers/misc/vmw_balloon.c | 23 +++++++----------------
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c
index 9047c0a529b2..efd733472a35 100644
--- a/drivers/misc/vmw_balloon.c
+++ b/drivers/misc/vmw_balloon.c
@@ -576,15 +576,9 @@ static void vmballoon_pop(struct vmballoon *b)
 		}
 	}
 
-	if (b->batch_page) {
-		vunmap(b->batch_page);
-		b->batch_page = NULL;
-	}
-
-	if (b->page) {
-		__free_page(b->page);
-		b->page = NULL;
-	}
+	/* Clearing the batch_page unconditionally has no adverse effect */
+	free_page((unsigned long)b->batch_page);
+	b->batch_page = NULL;
 }
 
 /*
@@ -991,16 +985,13 @@ static const struct vmballoon_ops vmballoon_batched_ops = {
 
 static bool vmballoon_init_batching(struct vmballoon *b)
 {
-	b->page = alloc_page(VMW_PAGE_ALLOC_NOSLEEP);
-	if (!b->page)
-		return false;
+	struct page *page;
 
-	b->batch_page = vmap(&b->page, 1, VM_MAP, PAGE_KERNEL);
-	if (!b->batch_page) {
-		__free_page(b->page);
+	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!page)
 		return false;
-	}
 
+	b->batch_page = page_address(page);
 	return true;
 }
 
-- 
2.14.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2] vmw_balloon: fixing double free when batching mode is off
  2018-03-12 19:19 [PATCH] vmw_balloon: fixing double free when batching mode is off Nadav Amit
@ 2018-03-12 19:28 ` Nadav Amit
  2018-03-14  4:02   ` Nadav Amit
  0 siblings, 1 reply; 19+ messages in thread
From: Nadav Amit @ 2018-03-12 19:28 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman
  Cc: Xavier Deguillard, VMware, Inc.,
	linux-kernel, Gil Kupfer, Oleksandr Natalenko, Nadav Amit,
	stable, Nadav Amit

From: Gil Kupfer <gilkup@gmail.com>

The balloon.page field is used for two different purposes if batching is
on or off. If batching is on, the field point to the page which is used
to communicate with with the hypervisor. If it is off, balloon.page
points to the page that is about to be (un)locked.

Unfortunately, this dual-purpose of the field introduced a bug: when the
balloon is popped (e.g., when the machine is reset or the balloon driver
is explicitly removed), the balloon driver frees, unconditionally, the
page that is held in balloon.page.  As a result, if batching is
disabled, this leads to double freeing the last page that is sent to the
hypervisor.

The following error occurs during rmmod when kernel checkers are on, and
the balloon is not empty:

[   42.307653] ------------[ cut here ]------------
[   42.307657] Kernel BUG at ffffffffba1e4b28 [verbose debug info unavailable]
[   42.307720] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[   42.312512] Modules linked in: vmw_vsock_vmci_transport vsock ppdev joydev vmw_balloon(-) input_leds serio_raw vmw_vmci parport_pc shpchp parport i2c_piix4 nfit mac_hid autofs4 vmwgfx drm_kms_helper hid_generic syscopyarea sysfillrect usbhid sysimgblt fb_sys_fops hid ttm mptspi scsi_transport_spi ahci mptscsih drm psmouse vmxnet3 libahci mptbase pata_acpi
[   42.312766] CPU: 10 PID: 1527 Comm: rmmod Not tainted 4.12.0+ #5
[   42.312803] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2016
[   42.313042] task: ffff9bf9680f8000 task.stack: ffffbfefc1638000
[   42.313290] RIP: 0010:__free_pages+0x38/0x40
[   42.313510] RSP: 0018:ffffbfefc163be98 EFLAGS: 00010246
[   42.313731] RAX: 000000000000003e RBX: ffffffffc02b9720 RCX: 0000000000000006
[   42.313972] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9bf97e08e0a0
[   42.314201] RBP: ffffbfefc163be98 R08: 0000000000000000 R09: 0000000000000000
[   42.314435] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc02b97e4
[   42.314505] R13: ffffffffc02b9748 R14: ffffffffc02b9728 R15: 0000000000000200
[   42.314550] FS:  00007f3af5fec700(0000) GS:ffff9bf97e080000(0000) knlGS:0000000000000000
[   42.314599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   42.314635] CR2: 00007f44f6f4ab24 CR3: 00000003a7d12000 CR4: 00000000000006e0
[   42.314864] Call Trace:
[   42.315774]  vmballoon_pop+0x102/0x130 [vmw_balloon]
[   42.315816]  vmballoon_exit+0x42/0xd64 [vmw_balloon]
[   42.315853]  SyS_delete_module+0x1e2/0x250
[   42.315891]  entry_SYSCALL_64_fastpath+0x23/0xc2
[   42.315924] RIP: 0033:0x7f3af5b0e8e7
[   42.315949] RSP: 002b:00007fffe6ce0148 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[   42.315996] RAX: ffffffffffffffda RBX: 000055be676401e0 RCX: 00007f3af5b0e8e7
[   42.316951] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055be67640248
[   42.317887] RBP: 0000000000000003 R08: 0000000000000000 R09: 1999999999999999
[   42.318845] R10: 0000000000000883 R11: 0000000000000206 R12: 00007fffe6cdf130
[   42.319755] R13: 0000000000000000 R14: 0000000000000000 R15: 000055be676401e0
[   42.320606] Code: c0 74 1c f0 ff 4f 1c 74 02 5d c3 85 f6 74 07 e8 0f d8 ff ff 5d c3 31 f6 e8 c6 fb ff ff 5d c3 48 c7 c6 c8 0f c5 ba e8 58 be 02 00 <0f> 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 85 ff 75 01 c3 55 48
[   42.323462] RIP: __free_pages+0x38/0x40 RSP: ffffbfefc163be98
[   42.325735] ---[ end trace 872e008e33f81508 ]---

To solve the bug, we eliminate the dual purpose of balloon.page.

Fixes: 220a80f0c2e7 ("VMware balloon: add batching to the vmw_balloon.")
Cc: stable@vger.kernel.org
Reported-by: Oleksandr Natalenko <onatalen@redhat.com>
Signed-off-by: Gil Kupfer <gilkup@gmail.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
---
v2: Fixing commit message

 drivers/misc/vmw_balloon.c | 23 +++++++----------------
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c
index 9047c0a529b2..efd733472a35 100644
--- a/drivers/misc/vmw_balloon.c
+++ b/drivers/misc/vmw_balloon.c
@@ -576,15 +576,9 @@ static void vmballoon_pop(struct vmballoon *b)
 		}
 	}
 
-	if (b->batch_page) {
-		vunmap(b->batch_page);
-		b->batch_page = NULL;
-	}
-
-	if (b->page) {
-		__free_page(b->page);
-		b->page = NULL;
-	}
+	/* Clearing the batch_page unconditionally has no adverse effect */
+	free_page((unsigned long)b->batch_page);
+	b->batch_page = NULL;
 }
 
 /*
@@ -991,16 +985,13 @@ static const struct vmballoon_ops vmballoon_batched_ops = {
 
 static bool vmballoon_init_batching(struct vmballoon *b)
 {
-	b->page = alloc_page(VMW_PAGE_ALLOC_NOSLEEP);
-	if (!b->page)
-		return false;
+	struct page *page;
 
-	b->batch_page = vmap(&b->page, 1, VM_MAP, PAGE_KERNEL);
-	if (!b->batch_page) {
-		__free_page(b->page);
+	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!page)
 		return false;
-	}
 
+	b->batch_page = page_address(page);
 	return true;
 }
 
-- 
2.14.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] vmw_balloon: fixing double free when batching mode is off
  2018-03-12 19:28 ` [PATCH v2] " Nadav Amit
@ 2018-03-14  4:02   ` Nadav Amit
  2018-03-14  7:41     ` Oleksandr Natalenko
  2018-03-19 18:52     ` Nadav Amit
  0 siblings, 2 replies; 19+ messages in thread
From: Nadav Amit @ 2018-03-14  4:02 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman, Oleksandr Natalenko
  Cc: Xavier Deguillard, pv-drivers, LKML, Gil Kupfer, stable

Nadav Amit <namit@vmware.com> wrote:

> From: Gil Kupfer <gilkup@gmail.com>
> 
> The balloon.page field is used for two different purposes if batching is
> on or off. If batching is on, the field point to the page which is used
> to communicate with with the hypervisor. If it is off, balloon.page
> points to the page that is about to be (un)locked.
> 
> Unfortunately, this dual-purpose of the field introduced a bug: when the
> balloon is popped (e.g., when the machine is reset or the balloon driver
> is explicitly removed), the balloon driver frees, unconditionally, the
> page that is held in balloon.page.  As a result, if batching is
> disabled, this leads to double freeing the last page that is sent to the
> hypervisor.

Oleksandr, if you can confirm that it fixes the bug you encountered, it
would be great.

Greg, Arnd, on your free time, please let me know if there is any issue
with the patch, and whether you can incorporate it, preferably in 4.16,
since it is a bug-fix that was encountered by Red-Hat customers.

Thanks,
Nadav

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] vmw_balloon: fixing double free when batching mode is off
  2018-03-14  4:02   ` Nadav Amit
@ 2018-03-14  7:41     ` Oleksandr Natalenko
  2018-03-19 18:52     ` Nadav Amit
  1 sibling, 0 replies; 19+ messages in thread
From: Oleksandr Natalenko @ 2018-03-14  7:41 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Xavier Deguillard, pv-drivers,
	LKML, Gil Kupfer, stable

Hello.

On Wed, Mar 14, 2018 at 5:02 AM, Nadav Amit <namit@vmware.com> wrote:
> Oleksandr, if you can confirm that it fixes the bug you encountered, it
> would be great.

Sure, I'm checking this possibility with a couple of customers, and
will reply back once I have some inputs on it.

> Greg, Arnd, on your free time, please let me know if there is any issue
> with the patch, and whether you can incorporate it, preferably in 4.16,
> since it is a bug-fix that was encountered by Red-Hat customers.

-- 
Best regards,
  Oleksandr Natalenko (post-factum)
  Software Maintenance Engineer

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] vmw_balloon: fixing double free when batching mode is off
  2018-03-14  4:02   ` Nadav Amit
  2018-03-14  7:41     ` Oleksandr Natalenko
@ 2018-03-19 18:52     ` Nadav Amit
  2018-03-20  7:21       ` Oleksandr Natalenko
  1 sibling, 1 reply; 19+ messages in thread
From: Nadav Amit @ 2018-03-19 18:52 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman, Oleksandr Natalenko
  Cc: Xavier Deguillard, pv-drivers, LKML, Gil Kupfer, stable

Nadav Amit <namit@vmware.com> wrote:

> Nadav Amit <namit@vmware.com> wrote:
> 
>> From: Gil Kupfer <gilkup@gmail.com>
>> 
>> The balloon.page field is used for two different purposes if batching is
>> on or off. If batching is on, the field point to the page which is used
>> to communicate with with the hypervisor. If it is off, balloon.page
>> points to the page that is about to be (un)locked.
>> 
>> Unfortunately, this dual-purpose of the field introduced a bug: when the
>> balloon is popped (e.g., when the machine is reset or the balloon driver
>> is explicitly removed), the balloon driver frees, unconditionally, the
>> page that is held in balloon.page.  As a result, if batching is
>> disabled, this leads to double freeing the last page that is sent to the
>> hypervisor.
> 
> Oleksandr, if you can confirm that it fixes the bug you encountered, it
> would be great.
> 
> Greg, Arnd, on your free time, please let me know if there is any issue
> with the patch, and whether you can incorporate it, preferably in 4.16,
> since it is a bug-fix that was encountered by Red-Hat customers.

Ping?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] vmw_balloon: fixing double free when batching mode is off
  2018-03-19 18:52     ` Nadav Amit
@ 2018-03-20  7:21       ` Oleksandr Natalenko
  2018-03-22 20:02         ` Nadav Amit
  0 siblings, 1 reply; 19+ messages in thread
From: Oleksandr Natalenko @ 2018-03-20  7:21 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Xavier Deguillard, pv-drivers,
	LKML, Gil Kupfer, stable, ldu

Hi.

On Mon, Mar 19, 2018 at 7:52 PM, Nadav Amit <namit@vmware.com> wrote:
>> Oleksandr, if you can confirm that it fixes the bug you encountered, it
>> would be great.
>>
>> Greg, Arnd, on your free time, please let me know if there is any issue
>> with the patch, and whether you can incorporate it, preferably in 4.16,
>> since it is a bug-fix that was encountered by Red-Hat customers.
>
> Ping?

No news from me (yet). We depends on the customer here since we are
unable to reproduce the issue in-house. If you have a recipe on how to
trigger it reliably, please let us know, and this will speed up the
verification process.

Thanks.

-- 
Best regards,
  Oleksandr Natalenko (post-factum)
  Software Maintenance Engineer

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] vmw_balloon: fixing double free when batching mode is off
  2018-03-20  7:21       ` Oleksandr Natalenko
@ 2018-03-22 20:02         ` Nadav Amit
  2018-03-23  8:42           ` Oleksandr Natalenko
  0 siblings, 1 reply; 19+ messages in thread
From: Nadav Amit @ 2018-03-22 20:02 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Xavier Deguillard, pv-drivers,
	LKML, Gil Kupfer, stable, ldu

Oleksandr Natalenko <oleksandr@redhat.com> wrote:

> Hi.
> 
> On Mon, Mar 19, 2018 at 7:52 PM, Nadav Amit <namit@vmware.com> wrote:
>>> Oleksandr, if you can confirm that it fixes the bug you encountered, it
>>> would be great.
>>> 
>>> Greg, Arnd, on your free time, please let me know if there is any issue
>>> with the patch, and whether you can incorporate it, preferably in 4.16,
>>> since it is a bug-fix that was encountered by Red-Hat customers.
>> 
>> Ping?
> 
> No news from me (yet). We depends on the customer here since we are
> unable to reproduce the issue in-house. If you have a recipe on how to
> trigger it reliably, please let us know, and this will speed up the
> verification process.

Actually, rechecking the bug description, I might have rushed. The bug that
I encountered is related to the vunmap() but it is only caused when batching
is off, while the bug you reported has batching on.

It seems that your sources support batching but do not include b91f108a3d54
(“VMware balloon: Treat init like reset”), which I think would have solved
the problem. I don’t see a mainline kernel that supports batching and does
not have this patch, so it would be helpful to see the exact module that you
use.

Regards,
Nadav

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] vmw_balloon: fixing double free when batching mode is off
  2018-03-22 20:02         ` Nadav Amit
@ 2018-03-23  8:42           ` Oleksandr Natalenko
  2018-04-16 11:36             ` Oleksandr Natalenko
  0 siblings, 1 reply; 19+ messages in thread
From: Oleksandr Natalenko @ 2018-03-23  8:42 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Xavier Deguillard, pv-drivers,
	LKML, Gil Kupfer, stable, ldu

Hi.

On Thu, Mar 22, 2018 at 9:02 PM, Nadav Amit <namit@vmware.com> wrote:
> Actually, rechecking the bug description, I might have rushed. The bug that
> I encountered is related to the vunmap() but it is only caused when batching
> is off, while the bug you reported has batching on.
>
> It seems that your sources support batching but do not include b91f108a3d54
> (“VMware balloon: Treat init like reset”), which I think would have solved
> the problem. I don’t see a mainline kernel that supports batching and does
> not have this patch, so it would be helpful to see the exact module that you
> use.

Correct. The hash is d7568c130d0d0ff1fc5b364fc879b91f108a3d54, actually, though.
Yes, we miss this commit in our code base, but also we've already
provided a test kernel
with this commit included to the customer. Unfortunately, we have no answer yet.

-- 
Best regards,
  Oleksandr Natalenko (post-factum)
  Software Maintenance Engineer

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] vmw_balloon: fixing double free when batching mode is off
  2018-03-23  8:42           ` Oleksandr Natalenko
@ 2018-04-16 11:36             ` Oleksandr Natalenko
  2018-04-18 17:54               ` Nadav Amit
  0 siblings, 1 reply; 19+ messages in thread
From: Oleksandr Natalenko @ 2018-04-16 11:36 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Xavier Deguillard, pv-drivers,
	LKML, Gil Kupfer, stable, ldu

Hello.

On Fri, Mar 23, 2018 at 9:42 AM, Oleksandr Natalenko
<oleksandr@redhat.com> wrote:
>> Actually, rechecking the bug description, I might have rushed. The bug that
>> I encountered is related to the vunmap() but it is only caused when batching
>> is off, while the bug you reported has batching on.
>>
>> It seems that your sources support batching but do not include b91f108a3d54
>> (“VMware balloon: Treat init like reset”), which I think would have solved
>> the problem. I don’t see a mainline kernel that supports batching and does
>> not have this patch, so it would be helpful to see the exact module that you
>> use.
>
> Correct. The hash is d7568c130d0d0ff1fc5b364fc879b91f108a3d54, actually, though.
> Yes, we miss this commit in our code base, but also we've already
> provided a test kernel
> with this commit included to the customer. Unfortunately, we have no answer yet.

At least for one customer I can confirm that the kernel built with
both d7568c130 and the proposed patch fixes the issue.

Thanks.

-- 
Best regards,
  Oleksandr Natalenko (post-factum)
  Senior Software Maintenance Engineer

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] vmw_balloon: fixing double free when batching mode is off
  2018-04-16 11:36             ` Oleksandr Natalenko
@ 2018-04-18 17:54               ` Nadav Amit
  2018-04-19  6:38                 ` Greg Kroah-Hartman
  0 siblings, 1 reply; 19+ messages in thread
From: Nadav Amit @ 2018-04-18 17:54 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Arnd Bergmann, Xavier Deguillard, pv-drivers, LKML, Gil Kupfer,
	stable, ldu, Oleksandr Natalenko

Oleksandr Natalenko <oleksandr@redhat.com> wrote:

> Hello.
> 
> On Fri, Mar 23, 2018 at 9:42 AM, Oleksandr Natalenko
> <oleksandr@redhat.com> wrote:
>>> Actually, rechecking the bug description, I might have rushed. The bug that
>>> I encountered is related to the vunmap() but it is only caused when batching
>>> is off, while the bug you reported has batching on.
>>> 
>>> It seems that your sources support batching but do not include b91f108a3d54
>>> (“VMware balloon: Treat init like reset”), which I think would have solved
>>> the problem. I don’t see a mainline kernel that supports batching and does
>>> not have this patch, so it would be helpful to see the exact module that you
>>> use.
>> 
>> Correct. The hash is d7568c130d0d0ff1fc5b364fc879b91f108a3d54, actually, though.
>> Yes, we miss this commit in our code base, but also we've already
>> provided a test kernel
>> with this commit included to the customer. Unfortunately, we have no answer yet.
> 
> At least for one customer I can confirm that the kernel built with
> both d7568c130 and the proposed patch fixes the issue.

Greg,

Can you commit please this patch? Preferably for 4.17?

There is one minor mistake in the commit message - this patch fixes
f220a80f0c2e and not as written. Please let me know if you want a v3 or can
amend it while committing.

Thanks,
Nadav


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] vmw_balloon: fixing double free when batching mode is off
  2018-04-18 17:54               ` Nadav Amit
@ 2018-04-19  6:38                 ` Greg Kroah-Hartman
  2018-04-19 18:17                   ` [PATCH v3] " Nadav Amit
  0 siblings, 1 reply; 19+ messages in thread
From: Greg Kroah-Hartman @ 2018-04-19  6:38 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Arnd Bergmann, Xavier Deguillard, pv-drivers, LKML, Gil Kupfer,
	stable, ldu, Oleksandr Natalenko

On Wed, Apr 18, 2018 at 05:54:21PM +0000, Nadav Amit wrote:
> Oleksandr Natalenko <oleksandr@redhat.com> wrote:
> 
> > Hello.
> > 
> > On Fri, Mar 23, 2018 at 9:42 AM, Oleksandr Natalenko
> > <oleksandr@redhat.com> wrote:
> >>> Actually, rechecking the bug description, I might have rushed. The bug that
> >>> I encountered is related to the vunmap() but it is only caused when batching
> >>> is off, while the bug you reported has batching on.
> >>> 
> >>> It seems that your sources support batching but do not include b91f108a3d54
> >>> (“VMware balloon: Treat init like reset”), which I think would have solved
> >>> the problem. I don’t see a mainline kernel that supports batching and does
> >>> not have this patch, so it would be helpful to see the exact module that you
> >>> use.
> >> 
> >> Correct. The hash is d7568c130d0d0ff1fc5b364fc879b91f108a3d54, actually, though.
> >> Yes, we miss this commit in our code base, but also we've already
> >> provided a test kernel
> >> with this commit included to the customer. Unfortunately, we have no answer yet.
> > 
> > At least for one customer I can confirm that the kernel built with
> > both d7568c130 and the proposed patch fixes the issue.
> 
> Greg,
> 
> Can you commit please this patch? Preferably for 4.17?
> 
> There is one minor mistake in the commit message - this patch fixes
> f220a80f0c2e and not as written. Please let me know if you want a v3 or can
> amend it while committing.

Please resend, never make a maintainer have to hand-edit a patch.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3] vmw_balloon: fixing double free when batching mode is off
  2018-04-19  6:38                 ` Greg Kroah-Hartman
@ 2018-04-19 18:17                   ` Nadav Amit
  2018-04-30 17:30                     ` Nadav Amit
  0 siblings, 1 reply; 19+ messages in thread
From: Nadav Amit @ 2018-04-19 18:17 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman
  Cc: Xavier Deguillard, VMware, Inc.,
	linux-kernel, Gil Kupfer, Oleksandr Natalenko, ldu, Nadav Amit,
	stable, Nadav Amit

From: Gil Kupfer <gilkup@gmail.com>

The balloon.page field is used for two different purposes if batching is
on or off. If batching is on, the field point to the page which is used
to communicate with with the hypervisor. If it is off, balloon.page
points to the page that is about to be (un)locked.

Unfortunately, this dual-purpose of the field introduced a bug: when the
balloon is popped (e.g., when the machine is reset or the balloon driver
is explicitly removed), the balloon driver frees, unconditionally, the
page that is held in balloon.page.  As a result, if batching is
disabled, this leads to double freeing the last page that is sent to the
hypervisor.

The following error occurs during rmmod when kernel checkers are on, and
the balloon is not empty:

[   42.307653] ------------[ cut here ]------------
[   42.307657] Kernel BUG at ffffffffba1e4b28 [verbose debug info unavailable]
[   42.307720] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[   42.312512] Modules linked in: vmw_vsock_vmci_transport vsock ppdev joydev vmw_balloon(-) input_leds serio_raw vmw_vmci parport_pc shpchp parport i2c_piix4 nfit mac_hid autofs4 vmwgfx drm_kms_helper hid_generic syscopyarea sysfillrect usbhid sysimgblt fb_sys_fops hid ttm mptspi scsi_transport_spi ahci mptscsih drm psmouse vmxnet3 libahci mptbase pata_acpi
[   42.312766] CPU: 10 PID: 1527 Comm: rmmod Not tainted 4.12.0+ #5
[   42.312803] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2016
[   42.313042] task: ffff9bf9680f8000 task.stack: ffffbfefc1638000
[   42.313290] RIP: 0010:__free_pages+0x38/0x40
[   42.313510] RSP: 0018:ffffbfefc163be98 EFLAGS: 00010246
[   42.313731] RAX: 000000000000003e RBX: ffffffffc02b9720 RCX: 0000000000000006
[   42.313972] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9bf97e08e0a0
[   42.314201] RBP: ffffbfefc163be98 R08: 0000000000000000 R09: 0000000000000000
[   42.314435] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc02b97e4
[   42.314505] R13: ffffffffc02b9748 R14: ffffffffc02b9728 R15: 0000000000000200
[   42.314550] FS:  00007f3af5fec700(0000) GS:ffff9bf97e080000(0000) knlGS:0000000000000000
[   42.314599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   42.314635] CR2: 00007f44f6f4ab24 CR3: 00000003a7d12000 CR4: 00000000000006e0
[   42.314864] Call Trace:
[   42.315774]  vmballoon_pop+0x102/0x130 [vmw_balloon]
[   42.315816]  vmballoon_exit+0x42/0xd64 [vmw_balloon]
[   42.315853]  SyS_delete_module+0x1e2/0x250
[   42.315891]  entry_SYSCALL_64_fastpath+0x23/0xc2
[   42.315924] RIP: 0033:0x7f3af5b0e8e7
[   42.315949] RSP: 002b:00007fffe6ce0148 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[   42.315996] RAX: ffffffffffffffda RBX: 000055be676401e0 RCX: 00007f3af5b0e8e7
[   42.316951] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055be67640248
[   42.317887] RBP: 0000000000000003 R08: 0000000000000000 R09: 1999999999999999
[   42.318845] R10: 0000000000000883 R11: 0000000000000206 R12: 00007fffe6cdf130
[   42.319755] R13: 0000000000000000 R14: 0000000000000000 R15: 000055be676401e0
[   42.320606] Code: c0 74 1c f0 ff 4f 1c 74 02 5d c3 85 f6 74 07 e8 0f d8 ff ff 5d c3 31 f6 e8 c6 fb ff ff 5d c3 48 c7 c6 c8 0f c5 ba e8 58 be 02 00 <0f> 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 85 ff 75 01 c3 55 48
[   42.323462] RIP: __free_pages+0x38/0x40 RSP: ffffbfefc163be98
[   42.325735] ---[ end trace 872e008e33f81508 ]---

To solve the bug, we eliminate the dual purpose of balloon.page.

Fixes: f220a80f0c2e ("VMware balloon: add batching to the vmw_balloon.")
Cc: stable@vger.kernel.org
Reported-by: Oleksandr Natalenko <onatalen@redhat.com>
Signed-off-by: Gil Kupfer <gilkup@gmail.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
---
 drivers/misc/vmw_balloon.c | 23 +++++++----------------
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c
index 9047c0a529b2..efd733472a35 100644
--- a/drivers/misc/vmw_balloon.c
+++ b/drivers/misc/vmw_balloon.c
@@ -576,15 +576,9 @@ static void vmballoon_pop(struct vmballoon *b)
 		}
 	}
 
-	if (b->batch_page) {
-		vunmap(b->batch_page);
-		b->batch_page = NULL;
-	}
-
-	if (b->page) {
-		__free_page(b->page);
-		b->page = NULL;
-	}
+	/* Clearing the batch_page unconditionally has no adverse effect */
+	free_page((unsigned long)b->batch_page);
+	b->batch_page = NULL;
 }
 
 /*
@@ -991,16 +985,13 @@ static const struct vmballoon_ops vmballoon_batched_ops = {
 
 static bool vmballoon_init_batching(struct vmballoon *b)
 {
-	b->page = alloc_page(VMW_PAGE_ALLOC_NOSLEEP);
-	if (!b->page)
-		return false;
+	struct page *page;
 
-	b->batch_page = vmap(&b->page, 1, VM_MAP, PAGE_KERNEL);
-	if (!b->batch_page) {
-		__free_page(b->page);
+	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!page)
 		return false;
-	}
 
+	b->batch_page = page_address(page);
 	return true;
 }
 
-- 
2.14.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] vmw_balloon: fixing double free when batching mode is off
  2018-04-19 18:17                   ` [PATCH v3] " Nadav Amit
@ 2018-04-30 17:30                     ` Nadav Amit
  2018-05-04 11:26                       ` Oleksandr Natalenko
  2018-05-30  1:21                       ` Nadav Amit
  0 siblings, 2 replies; 19+ messages in thread
From: Nadav Amit @ 2018-04-30 17:30 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman
  Cc: Xavier Deguillard, pv-drivers, LKML, Gil Kupfer,
	Oleksandr Natalenko, ldu, stable

Ping. Please consider it for inclusion for rc4.

Nadav Amit <namit@vmware.com> wrote:

> From: Gil Kupfer <gilkup@gmail.com>
> 
> The balloon.page field is used for two different purposes if batching is
> on or off. If batching is on, the field point to the page which is used
> to communicate with with the hypervisor. If it is off, balloon.page
> points to the page that is about to be (un)locked.
> 
> Unfortunately, this dual-purpose of the field introduced a bug: when the
> balloon is popped (e.g., when the machine is reset or the balloon driver
> is explicitly removed), the balloon driver frees, unconditionally, the
> page that is held in balloon.page.  As a result, if batching is
> disabled, this leads to double freeing the last page that is sent to the
> hypervisor.
> 
> The following error occurs during rmmod when kernel checkers are on, and
> the balloon is not empty:
> 
> [   42.307653] ------------[ cut here ]------------
> [   42.307657] Kernel BUG at ffffffffba1e4b28 [verbose debug info unavailable]
> [   42.307720] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
> [   42.312512] Modules linked in: vmw_vsock_vmci_transport vsock ppdev joydev vmw_balloon(-) input_leds serio_raw vmw_vmci parport_pc shpchp parport i2c_piix4 nfit mac_hid autofs4 vmwgfx drm_kms_helper hid_generic syscopyarea sysfillrect usbhid sysimgblt fb_sys_fops hid ttm mptspi scsi_transport_spi ahci mptscsih drm psmouse vmxnet3 libahci mptbase pata_acpi
> [   42.312766] CPU: 10 PID: 1527 Comm: rmmod Not tainted 4.12.0+ #5
> [   42.312803] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2016
> [   42.313042] task: ffff9bf9680f8000 task.stack: ffffbfefc1638000
> [   42.313290] RIP: 0010:__free_pages+0x38/0x40
> [   42.313510] RSP: 0018:ffffbfefc163be98 EFLAGS: 00010246
> [   42.313731] RAX: 000000000000003e RBX: ffffffffc02b9720 RCX: 0000000000000006
> [   42.313972] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9bf97e08e0a0
> [   42.314201] RBP: ffffbfefc163be98 R08: 0000000000000000 R09: 0000000000000000
> [   42.314435] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc02b97e4
> [   42.314505] R13: ffffffffc02b9748 R14: ffffffffc02b9728 R15: 0000000000000200
> [   42.314550] FS:  00007f3af5fec700(0000) GS:ffff9bf97e080000(0000) knlGS:0000000000000000
> [   42.314599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   42.314635] CR2: 00007f44f6f4ab24 CR3: 00000003a7d12000 CR4: 00000000000006e0
> [   42.314864] Call Trace:
> [   42.315774]  vmballoon_pop+0x102/0x130 [vmw_balloon]
> [   42.315816]  vmballoon_exit+0x42/0xd64 [vmw_balloon]
> [   42.315853]  SyS_delete_module+0x1e2/0x250
> [   42.315891]  entry_SYSCALL_64_fastpath+0x23/0xc2
> [   42.315924] RIP: 0033:0x7f3af5b0e8e7
> [   42.315949] RSP: 002b:00007fffe6ce0148 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> [   42.315996] RAX: ffffffffffffffda RBX: 000055be676401e0 RCX: 00007f3af5b0e8e7
> [   42.316951] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055be67640248
> [   42.317887] RBP: 0000000000000003 R08: 0000000000000000 R09: 1999999999999999
> [   42.318845] R10: 0000000000000883 R11: 0000000000000206 R12: 00007fffe6cdf130
> [   42.319755] R13: 0000000000000000 R14: 0000000000000000 R15: 000055be676401e0
> [   42.320606] Code: c0 74 1c f0 ff 4f 1c 74 02 5d c3 85 f6 74 07 e8 0f d8 ff ff 5d c3 31 f6 e8 c6 fb ff ff 5d c3 48 c7 c6 c8 0f c5 ba e8 58 be 02 00 <0f> 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 85 ff 75 01 c3 55 48
> [   42.323462] RIP: __free_pages+0x38/0x40 RSP: ffffbfefc163be98
> [   42.325735] ---[ end trace 872e008e33f81508 ]---
> 
> To solve the bug, we eliminate the dual purpose of balloon.page.
> 
> Fixes: f220a80f0c2e ("VMware balloon: add batching to the vmw_balloon.")
> Cc: stable@vger.kernel.org
> Reported-by: Oleksandr Natalenko <onatalen@redhat.com>
> Signed-off-by: Gil Kupfer <gilkup@gmail.com>
> Signed-off-by: Nadav Amit <namit@vmware.com>
> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
> ---
> drivers/misc/vmw_balloon.c | 23 +++++++----------------
> 1 file changed, 7 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c
> index 9047c0a529b2..efd733472a35 100644
> --- a/drivers/misc/vmw_balloon.c
> +++ b/drivers/misc/vmw_balloon.c
> @@ -576,15 +576,9 @@ static void vmballoon_pop(struct vmballoon *b)
> 		}
> 	}
> 
> -	if (b->batch_page) {
> -		vunmap(b->batch_page);
> -		b->batch_page = NULL;
> -	}
> -
> -	if (b->page) {
> -		__free_page(b->page);
> -		b->page = NULL;
> -	}
> +	/* Clearing the batch_page unconditionally has no adverse effect */
> +	free_page((unsigned long)b->batch_page);
> +	b->batch_page = NULL;
> }
> 
> /*
> @@ -991,16 +985,13 @@ static const struct vmballoon_ops vmballoon_batched_ops = {
> 
> static bool vmballoon_init_batching(struct vmballoon *b)
> {
> -	b->page = alloc_page(VMW_PAGE_ALLOC_NOSLEEP);
> -	if (!b->page)
> -		return false;
> +	struct page *page;
> 
> -	b->batch_page = vmap(&b->page, 1, VM_MAP, PAGE_KERNEL);
> -	if (!b->batch_page) {
> -		__free_page(b->page);
> +	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +	if (!page)
> 		return false;
> -	}
> 
> +	b->batch_page = page_address(page);
> 	return true;
> }
> 
> -- 
> 2.14.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] vmw_balloon: fixing double free when batching mode is off
  2018-04-30 17:30                     ` Nadav Amit
@ 2018-05-04 11:26                       ` Oleksandr Natalenko
  2018-05-30  1:21                       ` Nadav Amit
  1 sibling, 0 replies; 19+ messages in thread
From: Oleksandr Natalenko @ 2018-05-04 11:26 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Xavier Deguillard, pv-drivers,
	LKML, Gil Kupfer, ldu, stable

Hi.

On Mon, Apr 30, 2018 at 7:30 PM, Nadav Amit <namit@vmware.com> wrote:
> Ping. Please consider it for inclusion for rc4.

FWIW, we've got a positive feedback on the kernel built with this patch, so:

Tested-by: Oleksandr Natalenko <oleksandr@redhat.com>

Thanks.

-- 
Best regards,
  Oleksandr Natalenko (post-factum)
  Senior Software Maintenance Engineer

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] vmw_balloon: fixing double free when batching mode is off
  2018-04-30 17:30                     ` Nadav Amit
  2018-05-04 11:26                       ` Oleksandr Natalenko
@ 2018-05-30  1:21                       ` Nadav Amit
  2018-05-31 20:56                         ` Nadav Amit
  1 sibling, 1 reply; 19+ messages in thread
From: Nadav Amit @ 2018-05-30  1:21 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman
  Cc: Xavier Deguillard, pv-drivers, LKML, Gil Kupfer,
	Oleksandr Natalenko, ldu, stable

Nadav Amit <namit@vmware.com> wrote:

> Ping. Please consider it for inclusion for rc4.
> 
> Nadav Amit <namit@vmware.com> wrote:
> 
>> From: Gil Kupfer <gilkup@gmail.com>
>> 
>> The balloon.page field is used for two different purposes if batching is
>> on or off. If batching is on, the field point to the page which is used
>> to communicate with with the hypervisor. If it is off, balloon.page
>> points to the page that is about to be (un)locked.
>> 
>> Unfortunately, this dual-purpose of the field introduced a bug: when the
>> balloon is popped (e.g., when the machine is reset or the balloon driver
>> is explicitly removed), the balloon driver frees, unconditionally, the
>> page that is held in balloon.page.  As a result, if batching is
>> disabled, this leads to double freeing the last page that is sent to the
>> hypervisor.
>> 
>> The following error occurs during rmmod when kernel checkers are on, and
>> the balloon is not empty:
>> 
>> [   42.307653] ------------[ cut here ]------------
>> [   42.307657] Kernel BUG at ffffffffba1e4b28 [verbose debug info unavailable]
>> [   42.307720] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
>> [   42.312512] Modules linked in: vmw_vsock_vmci_transport vsock ppdev joydev vmw_balloon(-) input_leds serio_raw vmw_vmci parport_pc shpchp parport i2c_piix4 nfit mac_hid autofs4 vmwgfx drm_kms_helper hid_generic syscopyarea sysfillrect usbhid sysimgblt fb_sys_fops hid ttm mptspi scsi_transport_spi ahci mptscsih drm psmouse vmxnet3 libahci mptbase pata_acpi
>> [   42.312766] CPU: 10 PID: 1527 Comm: rmmod Not tainted 4.12.0+ #5
>> [   42.312803] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2016
>> [   42.313042] task: ffff9bf9680f8000 task.stack: ffffbfefc1638000
>> [   42.313290] RIP: 0010:__free_pages+0x38/0x40
>> [   42.313510] RSP: 0018:ffffbfefc163be98 EFLAGS: 00010246
>> [   42.313731] RAX: 000000000000003e RBX: ffffffffc02b9720 RCX: 0000000000000006
>> [   42.313972] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9bf97e08e0a0
>> [   42.314201] RBP: ffffbfefc163be98 R08: 0000000000000000 R09: 0000000000000000
>> [   42.314435] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc02b97e4
>> [   42.314505] R13: ffffffffc02b9748 R14: ffffffffc02b9728 R15: 0000000000000200
>> [   42.314550] FS:  00007f3af5fec700(0000) GS:ffff9bf97e080000(0000) knlGS:0000000000000000
>> [   42.314599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   42.314635] CR2: 00007f44f6f4ab24 CR3: 00000003a7d12000 CR4: 00000000000006e0
>> [   42.314864] Call Trace:
>> [   42.315774]  vmballoon_pop+0x102/0x130 [vmw_balloon]
>> [   42.315816]  vmballoon_exit+0x42/0xd64 [vmw_balloon]
>> [   42.315853]  SyS_delete_module+0x1e2/0x250
>> [   42.315891]  entry_SYSCALL_64_fastpath+0x23/0xc2
>> [   42.315924] RIP: 0033:0x7f3af5b0e8e7
>> [   42.315949] RSP: 002b:00007fffe6ce0148 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
>> [   42.315996] RAX: ffffffffffffffda RBX: 000055be676401e0 RCX: 00007f3af5b0e8e7
>> [   42.316951] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055be67640248
>> [   42.317887] RBP: 0000000000000003 R08: 0000000000000000 R09: 1999999999999999
>> [   42.318845] R10: 0000000000000883 R11: 0000000000000206 R12: 00007fffe6cdf130
>> [   42.319755] R13: 0000000000000000 R14: 0000000000000000 R15: 000055be676401e0
>> [   42.320606] Code: c0 74 1c f0 ff 4f 1c 74 02 5d c3 85 f6 74 07 e8 0f d8 ff ff 5d c3 31 f6 e8 c6 fb ff ff 5d c3 48 c7 c6 c8 0f c5 ba e8 58 be 02 00 <0f> 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 85 ff 75 01 c3 55 48
>> [   42.323462] RIP: __free_pages+0x38/0x40 RSP: ffffbfefc163be98
>> [   42.325735] ---[ end trace 872e008e33f81508 ]---
>> 
>> To solve the bug, we eliminate the dual purpose of balloon.page.
>> 
>> Fixes: f220a80f0c2e ("VMware balloon: add batching to the vmw_balloon.")
>> Cc: stable@vger.kernel.org
>> Reported-by: Oleksandr Natalenko <onatalen@redhat.com>
>> Signed-off-by: Gil Kupfer <gilkup@gmail.com>
>> Signed-off-by: Nadav Amit <namit@vmware.com>
>> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
>> ---
>> drivers/misc/vmw_balloon.c | 23 +++++++----------------
>> 1 file changed, 7 insertions(+), 16 deletions(-)
>> 
>> diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c
>> index 9047c0a529b2..efd733472a35 100644
>> --- a/drivers/misc/vmw_balloon.c
>> +++ b/drivers/misc/vmw_balloon.c
>> @@ -576,15 +576,9 @@ static void vmballoon_pop(struct vmballoon *b)
>> 		}
>> 	}
>> 
>> -	if (b->batch_page) {
>> -		vunmap(b->batch_page);
>> -		b->batch_page = NULL;
>> -	}
>> -
>> -	if (b->page) {
>> -		__free_page(b->page);
>> -		b->page = NULL;
>> -	}
>> +	/* Clearing the batch_page unconditionally has no adverse effect */
>> +	free_page((unsigned long)b->batch_page);
>> +	b->batch_page = NULL;
>> }
>> 
>> /*
>> @@ -991,16 +985,13 @@ static const struct vmballoon_ops vmballoon_batched_ops = {
>> 
>> static bool vmballoon_init_batching(struct vmballoon *b)
>> {
>> -	b->page = alloc_page(VMW_PAGE_ALLOC_NOSLEEP);
>> -	if (!b->page)
>> -		return false;
>> +	struct page *page;
>> 
>> -	b->batch_page = vmap(&b->page, 1, VM_MAP, PAGE_KERNEL);
>> -	if (!b->batch_page) {
>> -		__free_page(b->page);
>> +	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> +	if (!page)
>> 		return false;
>> -	}
>> 
>> +	b->batch_page = page_address(page);
>> 	return true;
>> }
>> 
>> -- 
>> 2.14.1

Greg, can you please include this patch?? 4.17 is almost out of the door,
and the last version was sent a month ago.

If you have any reservations, please let me know immediately.

Regards,
Nadav

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] vmw_balloon: fixing double free when batching mode is off
  2018-05-30  1:21                       ` Nadav Amit
@ 2018-05-31 20:56                         ` Nadav Amit
  2018-06-01  7:59                           ` Greg Kroah-Hartman
  0 siblings, 1 reply; 19+ messages in thread
From: Nadav Amit @ 2018-05-31 20:56 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman
  Cc: Xavier Deguillard, pv-drivers, LKML, Gil Kupfer,
	Oleksandr Natalenko, ldu, stable

Nadav Amit <namit@vmware.com> wrote:

> Nadav Amit <namit@vmware.com> wrote:
> 
>> Ping. Please consider it for inclusion for rc4.
>> 
>> Nadav Amit <namit@vmware.com> wrote:
>> 
>>> From: Gil Kupfer <gilkup@gmail.com>
>>> 
>>> The balloon.page field is used for two different purposes if batching is
>>> on or off. If batching is on, the field point to the page which is used
>>> to communicate with with the hypervisor. If it is off, balloon.page
>>> points to the page that is about to be (un)locked.
>>> 
>>> Unfortunately, this dual-purpose of the field introduced a bug: when the
>>> balloon is popped (e.g., when the machine is reset or the balloon driver
>>> is explicitly removed), the balloon driver frees, unconditionally, the
>>> page that is held in balloon.page.  As a result, if batching is
>>> disabled, this leads to double freeing the last page that is sent to the
>>> hypervisor.
>>> 
>>> The following error occurs during rmmod when kernel checkers are on, and
>>> the balloon is not empty:
>>> 
>>> [   42.307653] ------------[ cut here ]------------
>>> [   42.307657] Kernel BUG at ffffffffba1e4b28 [verbose debug info unavailable]
>>> [   42.307720] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
>>> [   42.312512] Modules linked in: vmw_vsock_vmci_transport vsock ppdev joydev vmw_balloon(-) input_leds serio_raw vmw_vmci parport_pc shpchp parport i2c_piix4 nfit mac_hid autofs4 vmwgfx drm_kms_helper hid_generic syscopyarea sysfillrect usbhid sysimgblt fb_sys_fops hid ttm mptspi scsi_transport_spi ahci mptscsih drm psmouse vmxnet3 libahci mptbase pata_acpi
>>> [   42.312766] CPU: 10 PID: 1527 Comm: rmmod Not tainted 4.12.0+ #5
>>> [   42.312803] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2016
>>> [   42.313042] task: ffff9bf9680f8000 task.stack: ffffbfefc1638000
>>> [   42.313290] RIP: 0010:__free_pages+0x38/0x40
>>> [   42.313510] RSP: 0018:ffffbfefc163be98 EFLAGS: 00010246
>>> [   42.313731] RAX: 000000000000003e RBX: ffffffffc02b9720 RCX: 0000000000000006
>>> [   42.313972] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9bf97e08e0a0
>>> [   42.314201] RBP: ffffbfefc163be98 R08: 0000000000000000 R09: 0000000000000000
>>> [   42.314435] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc02b97e4
>>> [   42.314505] R13: ffffffffc02b9748 R14: ffffffffc02b9728 R15: 0000000000000200
>>> [   42.314550] FS:  00007f3af5fec700(0000) GS:ffff9bf97e080000(0000) knlGS:0000000000000000
>>> [   42.314599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [   42.314635] CR2: 00007f44f6f4ab24 CR3: 00000003a7d12000 CR4: 00000000000006e0
>>> [   42.314864] Call Trace:
>>> [   42.315774]  vmballoon_pop+0x102/0x130 [vmw_balloon]
>>> [   42.315816]  vmballoon_exit+0x42/0xd64 [vmw_balloon]
>>> [   42.315853]  SyS_delete_module+0x1e2/0x250
>>> [   42.315891]  entry_SYSCALL_64_fastpath+0x23/0xc2
>>> [   42.315924] RIP: 0033:0x7f3af5b0e8e7
>>> [   42.315949] RSP: 002b:00007fffe6ce0148 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
>>> [   42.315996] RAX: ffffffffffffffda RBX: 000055be676401e0 RCX: 00007f3af5b0e8e7
>>> [   42.316951] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055be67640248
>>> [   42.317887] RBP: 0000000000000003 R08: 0000000000000000 R09: 1999999999999999
>>> [   42.318845] R10: 0000000000000883 R11: 0000000000000206 R12: 00007fffe6cdf130
>>> [   42.319755] R13: 0000000000000000 R14: 0000000000000000 R15: 000055be676401e0
>>> [   42.320606] Code: c0 74 1c f0 ff 4f 1c 74 02 5d c3 85 f6 74 07 e8 0f d8 ff ff 5d c3 31 f6 e8 c6 fb ff ff 5d c3 48 c7 c6 c8 0f c5 ba e8 58 be 02 00 <0f> 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 85 ff 75 01 c3 55 48
>>> [   42.323462] RIP: __free_pages+0x38/0x40 RSP: ffffbfefc163be98
>>> [   42.325735] ---[ end trace 872e008e33f81508 ]---
>>> 
>>> To solve the bug, we eliminate the dual purpose of balloon.page.
>>> 
>>> Fixes: f220a80f0c2e ("VMware balloon: add batching to the vmw_balloon.")
>>> Cc: stable@vger.kernel.org
>>> Reported-by: Oleksandr Natalenko <onatalen@redhat.com>
>>> Signed-off-by: Gil Kupfer <gilkup@gmail.com>
>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
>>> ---
>>> drivers/misc/vmw_balloon.c | 23 +++++++----------------
>>> 1 file changed, 7 insertions(+), 16 deletions(-)
>>> 
>>> diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c
>>> index 9047c0a529b2..efd733472a35 100644
>>> --- a/drivers/misc/vmw_balloon.c
>>> +++ b/drivers/misc/vmw_balloon.c
>>> @@ -576,15 +576,9 @@ static void vmballoon_pop(struct vmballoon *b)
>>> 		}
>>> 	}
>>> 
>>> -	if (b->batch_page) {
>>> -		vunmap(b->batch_page);
>>> -		b->batch_page = NULL;
>>> -	}
>>> -
>>> -	if (b->page) {
>>> -		__free_page(b->page);
>>> -		b->page = NULL;
>>> -	}
>>> +	/* Clearing the batch_page unconditionally has no adverse effect */
>>> +	free_page((unsigned long)b->batch_page);
>>> +	b->batch_page = NULL;
>>> }
>>> 
>>> /*
>>> @@ -991,16 +985,13 @@ static const struct vmballoon_ops vmballoon_batched_ops = {
>>> 
>>> static bool vmballoon_init_batching(struct vmballoon *b)
>>> {
>>> -	b->page = alloc_page(VMW_PAGE_ALLOC_NOSLEEP);
>>> -	if (!b->page)
>>> -		return false;
>>> +	struct page *page;
>>> 
>>> -	b->batch_page = vmap(&b->page, 1, VM_MAP, PAGE_KERNEL);
>>> -	if (!b->batch_page) {
>>> -		__free_page(b->page);
>>> +	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>> +	if (!page)
>>> 		return false;
>>> -	}
>>> 
>>> +	b->batch_page = page_address(page);
>>> 	return true;
>>> }
>>> 
>>> -- 
>>> 2.14.1
> 
> Greg, can you please include this patch?? 4.17 is almost out of the door,
> and the last version was sent a month ago.
> 
> If you have any reservations, please let me know immediately.

Arnd,

Perhaps you can - the very least - respond (or just include the patch)?

The last version of this patch was sent over a month ago.

Thanks,
Nadav

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 RESEND] vmw_balloon: fixing double free when batching mode is off
  2018-06-01 15:00                             ` Nadav Amit
@ 2018-06-01  7:47                               ` Nadav Amit
  0 siblings, 0 replies; 19+ messages in thread
From: Nadav Amit @ 2018-06-01  7:47 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Xavier Deguillard, VMware, Inc.,
	Arnd Bergmann, linux-kernel, Oleksandr Natalenko, Gil Kupfer,
	stable, Nadav Amit

From: Gil Kupfer <gilkup@gmail.com>

The balloon.page field is used for two different purposes if batching is
on or off. If batching is on, the field point to the page which is used
to communicate with with the hypervisor. If it is off, balloon.page
points to the page that is about to be (un)locked.

Unfortunately, this dual-purpose of the field introduced a bug: when the
balloon is popped (e.g., when the machine is reset or the balloon driver
is explicitly removed), the balloon driver frees, unconditionally, the
page that is held in balloon.page.  As a result, if batching is
disabled, this leads to double freeing the last page that is sent to the
hypervisor.

The following error occurs during rmmod when kernel checkers are on, and
the balloon is not empty:

[   42.307653] ------------[ cut here ]------------
[   42.307657] Kernel BUG at ffffffffba1e4b28 [verbose debug info unavailable]
[   42.307720] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[   42.312512] Modules linked in: vmw_vsock_vmci_transport vsock ppdev joydev vmw_balloon(-) input_leds serio_raw vmw_vmci parport_pc shpchp parport i2c_piix4 nfit mac_hid autofs4 vmwgfx drm_kms_helper hid_generic syscopyarea sysfillrect usbhid sysimgblt fb_sys_fops hid ttm mptspi scsi_transport_spi ahci mptscsih drm psmouse vmxnet3 libahci mptbase pata_acpi
[   42.312766] CPU: 10 PID: 1527 Comm: rmmod Not tainted 4.12.0+ #5
[   42.312803] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2016
[   42.313042] task: ffff9bf9680f8000 task.stack: ffffbfefc1638000
[   42.313290] RIP: 0010:__free_pages+0x38/0x40
[   42.313510] RSP: 0018:ffffbfefc163be98 EFLAGS: 00010246
[   42.313731] RAX: 000000000000003e RBX: ffffffffc02b9720 RCX: 0000000000000006
[   42.313972] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9bf97e08e0a0
[   42.314201] RBP: ffffbfefc163be98 R08: 0000000000000000 R09: 0000000000000000
[   42.314435] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc02b97e4
[   42.314505] R13: ffffffffc02b9748 R14: ffffffffc02b9728 R15: 0000000000000200
[   42.314550] FS:  00007f3af5fec700(0000) GS:ffff9bf97e080000(0000) knlGS:0000000000000000
[   42.314599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   42.314635] CR2: 00007f44f6f4ab24 CR3: 00000003a7d12000 CR4: 00000000000006e0
[   42.314864] Call Trace:
[   42.315774]  vmballoon_pop+0x102/0x130 [vmw_balloon]
[   42.315816]  vmballoon_exit+0x42/0xd64 [vmw_balloon]
[   42.315853]  SyS_delete_module+0x1e2/0x250
[   42.315891]  entry_SYSCALL_64_fastpath+0x23/0xc2
[   42.315924] RIP: 0033:0x7f3af5b0e8e7
[   42.315949] RSP: 002b:00007fffe6ce0148 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[   42.315996] RAX: ffffffffffffffda RBX: 000055be676401e0 RCX: 00007f3af5b0e8e7
[   42.316951] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055be67640248
[   42.317887] RBP: 0000000000000003 R08: 0000000000000000 R09: 1999999999999999
[   42.318845] R10: 0000000000000883 R11: 0000000000000206 R12: 00007fffe6cdf130
[   42.319755] R13: 0000000000000000 R14: 0000000000000000 R15: 000055be676401e0
[   42.320606] Code: c0 74 1c f0 ff 4f 1c 74 02 5d c3 85 f6 74 07 e8 0f d8 ff ff 5d c3 31 f6 e8 c6 fb ff ff 5d c3 48 c7 c6 c8 0f c5 ba e8 58 be 02 00 <0f> 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 85 ff 75 01 c3 55 48
[   42.323462] RIP: __free_pages+0x38/0x40 RSP: ffffbfefc163be98
[   42.325735] ---[ end trace 872e008e33f81508 ]---

To solve the bug, we eliminate the dual purpose of balloon.page.

Fixes: f220a80f0c2e ("VMware balloon: add batching to the vmw_balloon.")
Cc: stable@vger.kernel.org
Reported-by: Oleksandr Natalenko <onatalen@redhat.com>
Signed-off-by: Gil Kupfer <gilkup@gmail.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
Tested-by: Oleksandr Natalenko <oleksandr@redhat.com>
---
 drivers/misc/vmw_balloon.c | 23 +++++++----------------
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c
index 9047c0a529b2..efd733472a35 100644
--- a/drivers/misc/vmw_balloon.c
+++ b/drivers/misc/vmw_balloon.c
@@ -576,15 +576,9 @@ static void vmballoon_pop(struct vmballoon *b)
 		}
 	}
 
-	if (b->batch_page) {
-		vunmap(b->batch_page);
-		b->batch_page = NULL;
-	}
-
-	if (b->page) {
-		__free_page(b->page);
-		b->page = NULL;
-	}
+	/* Clearing the batch_page unconditionally has no adverse effect */
+	free_page((unsigned long)b->batch_page);
+	b->batch_page = NULL;
 }
 
 /*
@@ -991,16 +985,13 @@ static const struct vmballoon_ops vmballoon_batched_ops = {
 
 static bool vmballoon_init_batching(struct vmballoon *b)
 {
-	b->page = alloc_page(VMW_PAGE_ALLOC_NOSLEEP);
-	if (!b->page)
-		return false;
+	struct page *page;
 
-	b->batch_page = vmap(&b->page, 1, VM_MAP, PAGE_KERNEL);
-	if (!b->batch_page) {
-		__free_page(b->page);
+	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!page)
 		return false;
-	}
 
+	b->batch_page = page_address(page);
 	return true;
 }
 
-- 
2.17.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] vmw_balloon: fixing double free when batching mode is off
  2018-05-31 20:56                         ` Nadav Amit
@ 2018-06-01  7:59                           ` Greg Kroah-Hartman
  2018-06-01 15:00                             ` Nadav Amit
  0 siblings, 1 reply; 19+ messages in thread
From: Greg Kroah-Hartman @ 2018-06-01  7:59 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Arnd Bergmann, Xavier Deguillard, pv-drivers, LKML, Gil Kupfer,
	Oleksandr Natalenko, ldu, stable

On Thu, May 31, 2018 at 08:56:52PM +0000, Nadav Amit wrote:
> Nadav Amit <namit@vmware.com> wrote:
> 
> > Nadav Amit <namit@vmware.com> wrote:
> > 
> >> Ping. Please consider it for inclusion for rc4.
> >> 
> >> Nadav Amit <namit@vmware.com> wrote:
> >> 
> >>> From: Gil Kupfer <gilkup@gmail.com>
> >>> 
> >>> The balloon.page field is used for two different purposes if batching is
> >>> on or off. If batching is on, the field point to the page which is used
> >>> to communicate with with the hypervisor. If it is off, balloon.page
> >>> points to the page that is about to be (un)locked.
> >>> 
> >>> Unfortunately, this dual-purpose of the field introduced a bug: when the
> >>> balloon is popped (e.g., when the machine is reset or the balloon driver
> >>> is explicitly removed), the balloon driver frees, unconditionally, the
> >>> page that is held in balloon.page.  As a result, if batching is
> >>> disabled, this leads to double freeing the last page that is sent to the
> >>> hypervisor.
> >>> 
> >>> The following error occurs during rmmod when kernel checkers are on, and
> >>> the balloon is not empty:
> >>> 
> >>> [   42.307653] ------------[ cut here ]------------
> >>> [   42.307657] Kernel BUG at ffffffffba1e4b28 [verbose debug info unavailable]
> >>> [   42.307720] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
> >>> [   42.312512] Modules linked in: vmw_vsock_vmci_transport vsock ppdev joydev vmw_balloon(-) input_leds serio_raw vmw_vmci parport_pc shpchp parport i2c_piix4 nfit mac_hid autofs4 vmwgfx drm_kms_helper hid_generic syscopyarea sysfillrect usbhid sysimgblt fb_sys_fops hid ttm mptspi scsi_transport_spi ahci mptscsih drm psmouse vmxnet3 libahci mptbase pata_acpi
> >>> [   42.312766] CPU: 10 PID: 1527 Comm: rmmod Not tainted 4.12.0+ #5
> >>> [   42.312803] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2016
> >>> [   42.313042] task: ffff9bf9680f8000 task.stack: ffffbfefc1638000
> >>> [   42.313290] RIP: 0010:__free_pages+0x38/0x40
> >>> [   42.313510] RSP: 0018:ffffbfefc163be98 EFLAGS: 00010246
> >>> [   42.313731] RAX: 000000000000003e RBX: ffffffffc02b9720 RCX: 0000000000000006
> >>> [   42.313972] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9bf97e08e0a0
> >>> [   42.314201] RBP: ffffbfefc163be98 R08: 0000000000000000 R09: 0000000000000000
> >>> [   42.314435] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc02b97e4
> >>> [   42.314505] R13: ffffffffc02b9748 R14: ffffffffc02b9728 R15: 0000000000000200
> >>> [   42.314550] FS:  00007f3af5fec700(0000) GS:ffff9bf97e080000(0000) knlGS:0000000000000000
> >>> [   42.314599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [   42.314635] CR2: 00007f44f6f4ab24 CR3: 00000003a7d12000 CR4: 00000000000006e0
> >>> [   42.314864] Call Trace:
> >>> [   42.315774]  vmballoon_pop+0x102/0x130 [vmw_balloon]
> >>> [   42.315816]  vmballoon_exit+0x42/0xd64 [vmw_balloon]
> >>> [   42.315853]  SyS_delete_module+0x1e2/0x250
> >>> [   42.315891]  entry_SYSCALL_64_fastpath+0x23/0xc2
> >>> [   42.315924] RIP: 0033:0x7f3af5b0e8e7
> >>> [   42.315949] RSP: 002b:00007fffe6ce0148 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> >>> [   42.315996] RAX: ffffffffffffffda RBX: 000055be676401e0 RCX: 00007f3af5b0e8e7
> >>> [   42.316951] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055be67640248
> >>> [   42.317887] RBP: 0000000000000003 R08: 0000000000000000 R09: 1999999999999999
> >>> [   42.318845] R10: 0000000000000883 R11: 0000000000000206 R12: 00007fffe6cdf130
> >>> [   42.319755] R13: 0000000000000000 R14: 0000000000000000 R15: 000055be676401e0
> >>> [   42.320606] Code: c0 74 1c f0 ff 4f 1c 74 02 5d c3 85 f6 74 07 e8 0f d8 ff ff 5d c3 31 f6 e8 c6 fb ff ff 5d c3 48 c7 c6 c8 0f c5 ba e8 58 be 02 00 <0f> 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 85 ff 75 01 c3 55 48
> >>> [   42.323462] RIP: __free_pages+0x38/0x40 RSP: ffffbfefc163be98
> >>> [   42.325735] ---[ end trace 872e008e33f81508 ]---
> >>> 
> >>> To solve the bug, we eliminate the dual purpose of balloon.page.
> >>> 
> >>> Fixes: f220a80f0c2e ("VMware balloon: add batching to the vmw_balloon.")
> >>> Cc: stable@vger.kernel.org
> >>> Reported-by: Oleksandr Natalenko <onatalen@redhat.com>
> >>> Signed-off-by: Gil Kupfer <gilkup@gmail.com>
> >>> Signed-off-by: Nadav Amit <namit@vmware.com>
> >>> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
> >>> ---
> >>> drivers/misc/vmw_balloon.c | 23 +++++++----------------
> >>> 1 file changed, 7 insertions(+), 16 deletions(-)
> >>> 
> >>> diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c
> >>> index 9047c0a529b2..efd733472a35 100644
> >>> --- a/drivers/misc/vmw_balloon.c
> >>> +++ b/drivers/misc/vmw_balloon.c
> >>> @@ -576,15 +576,9 @@ static void vmballoon_pop(struct vmballoon *b)
> >>> 		}
> >>> 	}
> >>> 
> >>> -	if (b->batch_page) {
> >>> -		vunmap(b->batch_page);
> >>> -		b->batch_page = NULL;
> >>> -	}
> >>> -
> >>> -	if (b->page) {
> >>> -		__free_page(b->page);
> >>> -		b->page = NULL;
> >>> -	}
> >>> +	/* Clearing the batch_page unconditionally has no adverse effect */
> >>> +	free_page((unsigned long)b->batch_page);
> >>> +	b->batch_page = NULL;
> >>> }
> >>> 
> >>> /*
> >>> @@ -991,16 +985,13 @@ static const struct vmballoon_ops vmballoon_batched_ops = {
> >>> 
> >>> static bool vmballoon_init_batching(struct vmballoon *b)
> >>> {
> >>> -	b->page = alloc_page(VMW_PAGE_ALLOC_NOSLEEP);
> >>> -	if (!b->page)
> >>> -		return false;
> >>> +	struct page *page;
> >>> 
> >>> -	b->batch_page = vmap(&b->page, 1, VM_MAP, PAGE_KERNEL);
> >>> -	if (!b->batch_page) {
> >>> -		__free_page(b->page);
> >>> +	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> >>> +	if (!page)
> >>> 		return false;
> >>> -	}
> >>> 
> >>> +	b->batch_page = page_address(page);
> >>> 	return true;
> >>> }
> >>> 
> >>> -- 
> >>> 2.14.1
> > 
> > Greg, can you please include this patch?? 4.17 is almost out of the door,
> > and the last version was sent a month ago.
> > 
> > If you have any reservations, please let me know immediately.
> 
> Arnd,
> 
> Perhaps you can - the very least - respond (or just include the patch)?
> 
> The last version of this patch was sent over a month ago.

It's too late for 4.17 now, and as this touches core code, I'll wait for
the next merge window cycle.  It's also not even in my patch review
queue anymore, I guess the long "does this solve the problem" delays
that this went through pushed it out.

So I would need it resent no matter what.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] vmw_balloon: fixing double free when batching mode is off
  2018-06-01  7:59                           ` Greg Kroah-Hartman
@ 2018-06-01 15:00                             ` Nadav Amit
  2018-06-01  7:47                               ` [PATCH v3 RESEND] " Nadav Amit
  0 siblings, 1 reply; 19+ messages in thread
From: Nadav Amit @ 2018-06-01 15:00 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Arnd Bergmann, Xavier Deguillard, pv-drivers, LKML, Gil Kupfer,
	Oleksandr Natalenko, ldu, stable

Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:

> On Thu, May 31, 2018 at 08:56:52PM +0000, Nadav Amit wrote:
>> Nadav Amit <namit@vmware.com> wrote:
>> 
>>> Nadav Amit <namit@vmware.com> wrote:
>>> 
>>>> Ping. Please consider it for inclusion for rc4.
>>>> 
>>>> Nadav Amit <namit@vmware.com> wrote:
>>>> 
>>>>> From: Gil Kupfer <gilkup@gmail.com>
>>>>> 
>>>>> The balloon.page field is used for two different purposes if batching is
>>>>> on or off. If batching is on, the field point to the page which is used
>>>>> to communicate with with the hypervisor. If it is off, balloon.page
>>>>> points to the page that is about to be (un)locked.
>>>>> 
>>>>> Unfortunately, this dual-purpose of the field introduced a bug: when the
>>>>> balloon is popped (e.g., when the machine is reset or the balloon driver
>>>>> is explicitly removed), the balloon driver frees, unconditionally, the
>>>>> page that is held in balloon.page.  As a result, if batching is
>>>>> disabled, this leads to double freeing the last page that is sent to the
>>>>> hypervisor.
>>>>> 
>>>>> The following error occurs during rmmod when kernel checkers are on, and
>>>>> the balloon is not empty:
>>>>> 
>>>>> [   42.307653] ------------[ cut here ]------------
>>>>> [   42.307657] Kernel BUG at ffffffffba1e4b28 [verbose debug info unavailable]
>>>>> [   42.307720] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
>>>>> [   42.312512] Modules linked in: vmw_vsock_vmci_transport vsock ppdev joydev vmw_balloon(-) input_leds serio_raw vmw_vmci parport_pc shpchp parport i2c_piix4 nfit mac_hid autofs4 vmwgfx drm_kms_helper hid_generic syscopyarea sysfillrect usbhid sysimgblt fb_sys_fops hid ttm mptspi scsi_transport_spi ahci mptscsih drm psmouse vmxnet3 libahci mptbase pata_acpi
>>>>> [   42.312766] CPU: 10 PID: 1527 Comm: rmmod Not tainted 4.12.0+ #5
>>>>> [   42.312803] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2016
>>>>> [   42.313042] task: ffff9bf9680f8000 task.stack: ffffbfefc1638000
>>>>> [   42.313290] RIP: 0010:__free_pages+0x38/0x40
>>>>> [   42.313510] RSP: 0018:ffffbfefc163be98 EFLAGS: 00010246
>>>>> [   42.313731] RAX: 000000000000003e RBX: ffffffffc02b9720 RCX: 0000000000000006
>>>>> [   42.313972] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9bf97e08e0a0
>>>>> [   42.314201] RBP: ffffbfefc163be98 R08: 0000000000000000 R09: 0000000000000000
>>>>> [   42.314435] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc02b97e4
>>>>> [   42.314505] R13: ffffffffc02b9748 R14: ffffffffc02b9728 R15: 0000000000000200
>>>>> [   42.314550] FS:  00007f3af5fec700(0000) GS:ffff9bf97e080000(0000) knlGS:0000000000000000
>>>>> [   42.314599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [   42.314635] CR2: 00007f44f6f4ab24 CR3: 00000003a7d12000 CR4: 00000000000006e0
>>>>> [   42.314864] Call Trace:
>>>>> [   42.315774]  vmballoon_pop+0x102/0x130 [vmw_balloon]
>>>>> [   42.315816]  vmballoon_exit+0x42/0xd64 [vmw_balloon]
>>>>> [   42.315853]  SyS_delete_module+0x1e2/0x250
>>>>> [   42.315891]  entry_SYSCALL_64_fastpath+0x23/0xc2
>>>>> [   42.315924] RIP: 0033:0x7f3af5b0e8e7
>>>>> [   42.315949] RSP: 002b:00007fffe6ce0148 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
>>>>> [   42.315996] RAX: ffffffffffffffda RBX: 000055be676401e0 RCX: 00007f3af5b0e8e7
>>>>> [   42.316951] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055be67640248
>>>>> [   42.317887] RBP: 0000000000000003 R08: 0000000000000000 R09: 1999999999999999
>>>>> [   42.318845] R10: 0000000000000883 R11: 0000000000000206 R12: 00007fffe6cdf130
>>>>> [   42.319755] R13: 0000000000000000 R14: 0000000000000000 R15: 000055be676401e0
>>>>> [   42.320606] Code: c0 74 1c f0 ff 4f 1c 74 02 5d c3 85 f6 74 07 e8 0f d8 ff ff 5d c3 31 f6 e8 c6 fb ff ff 5d c3 48 c7 c6 c8 0f c5 ba e8 58 be 02 00 <0f> 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 85 ff 75 01 c3 55 48
>>>>> [   42.323462] RIP: __free_pages+0x38/0x40 RSP: ffffbfefc163be98
>>>>> [   42.325735] ---[ end trace 872e008e33f81508 ]---
>>>>> 
>>>>> To solve the bug, we eliminate the dual purpose of balloon.page.
>>>>> 
>>>>> Fixes: f220a80f0c2e ("VMware balloon: add batching to the vmw_balloon.")
>>>>> Cc: stable@vger.kernel.org
>>>>> Reported-by: Oleksandr Natalenko <onatalen@redhat.com>
>>>>> Signed-off-by: Gil Kupfer <gilkup@gmail.com>
>>>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>>>> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com>
>>>>> ---
>>>>> drivers/misc/vmw_balloon.c | 23 +++++++----------------
>>>>> 1 file changed, 7 insertions(+), 16 deletions(-)
>>>>> 
>>>>> diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c
>>>>> index 9047c0a529b2..efd733472a35 100644
>>>>> --- a/drivers/misc/vmw_balloon.c
>>>>> +++ b/drivers/misc/vmw_balloon.c
>>>>> @@ -576,15 +576,9 @@ static void vmballoon_pop(struct vmballoon *b)
>>>>> 		}
>>>>> 	}
>>>>> 
>>>>> -	if (b->batch_page) {
>>>>> -		vunmap(b->batch_page);
>>>>> -		b->batch_page = NULL;
>>>>> -	}
>>>>> -
>>>>> -	if (b->page) {
>>>>> -		__free_page(b->page);
>>>>> -		b->page = NULL;
>>>>> -	}
>>>>> +	/* Clearing the batch_page unconditionally has no adverse effect */
>>>>> +	free_page((unsigned long)b->batch_page);
>>>>> +	b->batch_page = NULL;
>>>>> }
>>>>> 
>>>>> /*
>>>>> @@ -991,16 +985,13 @@ static const struct vmballoon_ops vmballoon_batched_ops = {
>>>>> 
>>>>> static bool vmballoon_init_batching(struct vmballoon *b)
>>>>> {
>>>>> -	b->page = alloc_page(VMW_PAGE_ALLOC_NOSLEEP);
>>>>> -	if (!b->page)
>>>>> -		return false;
>>>>> +	struct page *page;
>>>>> 
>>>>> -	b->batch_page = vmap(&b->page, 1, VM_MAP, PAGE_KERNEL);
>>>>> -	if (!b->batch_page) {
>>>>> -		__free_page(b->page);
>>>>> +	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>> +	if (!page)
>>>>> 		return false;
>>>>> -	}
>>>>> 
>>>>> +	b->batch_page = page_address(page);
>>>>> 	return true;
>>>>> }
>>>>> 
>>>>> -- 
>>>>> 2.14.1
>>> 
>>> Greg, can you please include this patch?? 4.17 is almost out of the door,
>>> and the last version was sent a month ago.
>>> 
>>> If you have any reservations, please let me know immediately.
>> 
>> Arnd,
>> 
>> Perhaps you can - the very least - respond (or just include the patch)?
>> 
>> The last version of this patch was sent over a month ago.
> 
> It's too late for 4.17 now, and as this touches core code, I'll wait for
> the next merge window cycle.  It's also not even in my patch review
> queue anymore, I guess the long "does this solve the problem" delays
> that this went through pushed it out.
> 
> So I would need it resent no matter what.

There was no delay from our side. This whole interaction regarding a simple
fix is really frustrating. I will resend the patch.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-06-01 15:02 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-12 19:19 [PATCH] vmw_balloon: fixing double free when batching mode is off Nadav Amit
2018-03-12 19:28 ` [PATCH v2] " Nadav Amit
2018-03-14  4:02   ` Nadav Amit
2018-03-14  7:41     ` Oleksandr Natalenko
2018-03-19 18:52     ` Nadav Amit
2018-03-20  7:21       ` Oleksandr Natalenko
2018-03-22 20:02         ` Nadav Amit
2018-03-23  8:42           ` Oleksandr Natalenko
2018-04-16 11:36             ` Oleksandr Natalenko
2018-04-18 17:54               ` Nadav Amit
2018-04-19  6:38                 ` Greg Kroah-Hartman
2018-04-19 18:17                   ` [PATCH v3] " Nadav Amit
2018-04-30 17:30                     ` Nadav Amit
2018-05-04 11:26                       ` Oleksandr Natalenko
2018-05-30  1:21                       ` Nadav Amit
2018-05-31 20:56                         ` Nadav Amit
2018-06-01  7:59                           ` Greg Kroah-Hartman
2018-06-01 15:00                             ` Nadav Amit
2018-06-01  7:47                               ` [PATCH v3 RESEND] " Nadav Amit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).