LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/2] Drivers: hv: hv_balloon: Fix a deadlock in the hot-add path. 
@ 2014-12-06  0:41 K. Y. Srinivasan
  2014-12-06  0:41 ` [PATCH 1/2] Drivers: base: core: Export functions to lock/unlock device hotplug lock K. Y. Srinivasan
  2014-12-08  0:07 ` [PATCH 0/2] Drivers: hv: hv_balloon: Fix a deadlock in the hot-add path Yasuaki Ishimatsu
  0 siblings, 2 replies; 12+ messages in thread
From: K. Y. Srinivasan @ 2014-12-06  0:41 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, olaf, apw, linux-mm, isimatu.yasuaki
  Cc: K. Y. Srinivasan

Fix a deadlock in the hot-add path in the Hyper-V balloon driver.

K. Y. Srinivasan (2):
  Drivers: base: core: Export functions to lock/unlock device hotplug
    lock
  Drivers: hv: balloon: Fix the deadlock issue in the memory hot-add
    code

 drivers/base/core.c     |    2 ++
 drivers/hv/hv_balloon.c |    4 ++++
 2 files changed, 6 insertions(+), 0 deletions(-)

-- 
1.7.4.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] Drivers: base: core: Export functions to lock/unlock device hotplug lock
  2014-12-06  0:41 [PATCH 0/2] Drivers: hv: hv_balloon: Fix a deadlock in the hot-add path K. Y. Srinivasan
@ 2014-12-06  0:41 ` K. Y. Srinivasan
  2014-12-06  0:41   ` [PATCH 2/2] Drivers: hv: balloon: Fix the deadlock issue in the memory hot-add code K. Y. Srinivasan
  2014-12-08  0:07 ` [PATCH 0/2] Drivers: hv: hv_balloon: Fix a deadlock in the hot-add path Yasuaki Ishimatsu
  1 sibling, 1 reply; 12+ messages in thread
From: K. Y. Srinivasan @ 2014-12-06  0:41 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, olaf, apw, linux-mm, isimatu.yasuaki
  Cc: K. Y. Srinivasan

The Hyper-V balloon driver does memory hot-add. The device_hotplug_lock
is designed to address AB BA deadlock issues between the hot-add path
and the sysfs path. Export the APIs to acquire and release the
device_hotplug_lock for use by loadable modules that want to
hot-add memory or CPU.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
---
 drivers/base/core.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 97e2baf..b3073af 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -55,11 +55,13 @@ void lock_device_hotplug(void)
 {
 	mutex_lock(&device_hotplug_lock);
 }
+EXPORT_SYMBOL_GPL(lock_device_hotplug);
 
 void unlock_device_hotplug(void)
 {
 	mutex_unlock(&device_hotplug_lock);
 }
+EXPORT_SYMBOL_GPL(unlock_device_hotplug);
 
 int lock_device_hotplug_sysfs(void)
 {
-- 
1.7.4.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/2] Drivers: hv: balloon: Fix the deadlock issue in the memory hot-add code
  2014-12-06  0:41 ` [PATCH 1/2] Drivers: base: core: Export functions to lock/unlock device hotplug lock K. Y. Srinivasan
@ 2014-12-06  0:41   ` K. Y. Srinivasan
  2014-12-08 15:04     ` Michal Hocko
  0 siblings, 1 reply; 12+ messages in thread
From: K. Y. Srinivasan @ 2014-12-06  0:41 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, olaf, apw, linux-mm, isimatu.yasuaki
  Cc: K. Y. Srinivasan

Andy Whitcroft <apw@canonical.com> initially saw this deadlock. We
have seen this as well. Here is the original description of the
problem (and a potential solution) from Andy:

https://lkml.org/lkml/2014/3/14/451

Here is an excerpt from that mail:

"We are seeing machines lockup with what appears to be an ABBA
deadlock in the memory hotplug system.  These are from the 3.13.6 based Ubuntu kernels.
The hv_balloon driver is adding memory using add_memory() which takes
the hotplug lock, and then emits a udev event, and then attempts to
lock the sysfs device.  In response to the udev event udev opens the
sysfs device and locks it, then attempts to grab the hotplug lock to online the memory.
This seems to be inverted nesting in the two cases, leading to the hangs below:

[  240.608612] INFO: task kworker/0:2:861 blocked for more than 120 seconds.
[  240.608705] INFO: task systemd-udevd:1906 blocked for more than 120 seconds.

I note that the device hotplug locking allows complete retries (via
ERESTARTSYS) and if we could detect this at the online stage it could
be used to get us out.  But before I go down this road I wanted to
make sure I am reading this right.  Or indeed if the hv_balloon driver
is just doing this wrong."

This patch is based on the suggestion from
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
---
 drivers/hv/hv_balloon.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index afdb0d5..f525a62 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -22,6 +22,7 @@
 #include <linux/jiffies.h>
 #include <linux/mman.h>
 #include <linux/delay.h>
+#include <linux/device.h>
 #include <linux/init.h>
 #include <linux/module.h>
 #include <linux/slab.h>
@@ -649,8 +650,11 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 
 		release_region_mutex(false);
 		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
+
+		lock_device_hotplug();
 		ret = add_memory(nid, PFN_PHYS((start_pfn)),
 				(HA_CHUNK << PAGE_SHIFT));
+		unlock_device_hotplug();
 
 		if (ret) {
 			pr_info("hot_add memory failed error is %d\n", ret);
-- 
1.7.4.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Drivers: hv: hv_balloon: Fix a deadlock in the hot-add path.
  2014-12-06  0:41 [PATCH 0/2] Drivers: hv: hv_balloon: Fix a deadlock in the hot-add path K. Y. Srinivasan
  2014-12-06  0:41 ` [PATCH 1/2] Drivers: base: core: Export functions to lock/unlock device hotplug lock K. Y. Srinivasan
@ 2014-12-08  0:07 ` Yasuaki Ishimatsu
  1 sibling, 0 replies; 12+ messages in thread
From: Yasuaki Ishimatsu @ 2014-12-08  0:07 UTC (permalink / raw)
  To: K. Y. Srinivasan, gregkh, linux-kernel, devel, olaf, apw, linux-mm

(2014/12/06 9:41), K. Y. Srinivasan wrote:
> Fix a deadlock in the hot-add path in the Hyper-V balloon driver.
> 

> K. Y. Srinivasan (2):
>    Drivers: base: core: Export functions to lock/unlock device hotplug
>      lock
>    Drivers: hv: balloon: Fix the deadlock issue in the memory hot-add
>      code

Looks good to me.

Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

Thanks,
Yasauaki Ishimatsu

> 
>   drivers/base/core.c     |    2 ++
>   drivers/hv/hv_balloon.c |    4 ++++
>   2 files changed, 6 insertions(+), 0 deletions(-)
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] Drivers: hv: balloon: Fix the deadlock issue in the memory hot-add code
  2014-12-06  0:41   ` [PATCH 2/2] Drivers: hv: balloon: Fix the deadlock issue in the memory hot-add code K. Y. Srinivasan
@ 2014-12-08 15:04     ` Michal Hocko
  2014-12-09  1:23       ` Yasuaki Ishimatsu
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2014-12-08 15:04 UTC (permalink / raw)
  To: K. Y. Srinivasan
  Cc: gregkh, linux-kernel, devel, olaf, apw, linux-mm, isimatu.yasuaki

On Fri 05-12-14 16:41:38, K. Y. Srinivasan wrote:
> Andy Whitcroft <apw@canonical.com> initially saw this deadlock. We
> have seen this as well. Here is the original description of the
> problem (and a potential solution) from Andy:
> 
> https://lkml.org/lkml/2014/3/14/451
> 
> Here is an excerpt from that mail:
> 
> "We are seeing machines lockup with what appears to be an ABBA
> deadlock in the memory hotplug system.  These are from the 3.13.6 based Ubuntu kernels.
> The hv_balloon driver is adding memory using add_memory() which takes
> the hotplug lock

Do you mean mem_hotplug_begin?

> and then emits a udev event, and then attempts to
> lock the sysfs device.  In response to the udev event udev opens the
> sysfs device and locks it, then attempts to grab the hotplug lock to online the memory.

Cannot we simply teach online_pages to fail with EBUSY when the memory
hotplug is on the way. We shouldn't try to online something that is not
initialized yet, no? The memory hotplug log is global so we can get
false positives but that should be easier to deal with than exporting
lock_device_hotplug and adding yet another lock dependency.

> This seems to be inverted nesting in the two cases, leading to the hangs below:
> 
> [  240.608612] INFO: task kworker/0:2:861 blocked for more than 120 seconds.
> [  240.608705] INFO: task systemd-udevd:1906 blocked for more than 120 seconds.
> 
> I note that the device hotplug locking allows complete retries (via
> ERESTARTSYS) and if we could detect this at the online stage it could
> be used to get us out.

I am not sure I understand this but it suggests EBUSY above?

> But before I go down this road I wanted to
> make sure I am reading this right.  Or indeed if the hv_balloon driver
> is just doing this wrong."
> 
> This patch is based on the suggestion from
> Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

This changelog doesn't explain us much. And boy this whole thing is so
convoluted. E.g. I have hard time to see why ACPI hotplug is working
correctly. My trail got lost at acpi_memory_device_add level which is
a callback while acpi_device_hotplug is holding lock_device_hotplug but
then again the rest is hidden by callbacks. I cannot seem to find any
documentation which would explain all the locking here.

So why other callers of add_memory don't need the same treatment and if
they do then why don't we use the lock at add_memory level?
 
> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>

Nak to this without proper explanation and I really think that it should
be the onlining code which should deal with the parallel add_memory and
back off until the full initialization is done.

> ---
>  drivers/hv/hv_balloon.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> index afdb0d5..f525a62 100644
> --- a/drivers/hv/hv_balloon.c
> +++ b/drivers/hv/hv_balloon.c
> @@ -22,6 +22,7 @@
>  #include <linux/jiffies.h>
>  #include <linux/mman.h>
>  #include <linux/delay.h>
> +#include <linux/device.h>
>  #include <linux/init.h>
>  #include <linux/module.h>
>  #include <linux/slab.h>
> @@ -649,8 +650,11 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
>  
>  		release_region_mutex(false);
>  		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
> +
> +		lock_device_hotplug();
>  		ret = add_memory(nid, PFN_PHYS((start_pfn)),
>  				(HA_CHUNK << PAGE_SHIFT));
> +		unlock_device_hotplug();
>  
>  		if (ret) {
>  			pr_info("hot_add memory failed error is %d\n", ret);
> -- 
> 1.7.4.1
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] Drivers: hv: balloon: Fix the deadlock issue in the memory hot-add code
  2014-12-08 15:04     ` Michal Hocko
@ 2014-12-09  1:23       ` Yasuaki Ishimatsu
  2014-12-09  9:08         ` Michal Hocko
  0 siblings, 1 reply; 12+ messages in thread
From: Yasuaki Ishimatsu @ 2014-12-09  1:23 UTC (permalink / raw)
  To: Michal Hocko, K. Y. Srinivasan
  Cc: gregkh, linux-kernel, devel, olaf, apw, linux-mm

(2014/12/09 0:04), Michal Hocko wrote:
> On Fri 05-12-14 16:41:38, K. Y. Srinivasan wrote:
>> Andy Whitcroft <apw@canonical.com> initially saw this deadlock. We
>> have seen this as well. Here is the original description of the
>> problem (and a potential solution) from Andy:
>>
>> https://lkml.org/lkml/2014/3/14/451
>>
>> Here is an excerpt from that mail:
>>
>> "We are seeing machines lockup with what appears to be an ABBA
>> deadlock in the memory hotplug system.  These are from the 3.13.6 based Ubuntu kernels.
>> The hv_balloon driver is adding memory using add_memory() which takes
>> the hotplug lock
>
> Do you mean mem_hotplug_begin?
>

>> and then emits a udev event, and then attempts to
>> lock the sysfs device.  In response to the udev event udev opens the
>> sysfs device and locks it, then attempts to grab the hotplug lock to online the memory.
>
> Cannot we simply teach online_pages to fail with EBUSY when the memory
> hotplug is on the way.  We shouldn't try to online something that is not
> initialized yet, no?

Yes. Memory online shouldn't try before initializing it. Then memory online
should wait for initializing it, not easily fails. Generally, kernel sends
memory ONLINE event to userland by kobject_uevent() during initializing memory
and udev makes memory online after catching the event. Onlining memory by
udev almost run during initializing memory. So if memory online easily
fails, onlining memory by udev almost fails.

> The memory hotplug log is global so we can get
> false positives but that should be easier to deal with than exporting
> lock_device_hotplug and adding yet another lock dependency.
>
>> This seems to be inverted nesting in the two cases, leading to the hangs below:
>>
>> [  240.608612] INFO: task kworker/0:2:861 blocked for more than 120 seconds.
>> [  240.608705] INFO: task systemd-udevd:1906 blocked for more than 120 seconds.
>>
>> I note that the device hotplug locking allows complete retries (via
>> ERESTARTSYS) and if we could detect this at the online stage it could
>> be used to get us out.
>
> I am not sure I understand this but it suggests EBUSY above?
>
>> But before I go down this road I wanted to
>> make sure I am reading this right.  Or indeed if the hv_balloon driver
>> is just doing this wrong."
>>
>> This patch is based on the suggestion from
>> Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>
> This changelog doesn't explain us much. And boy this whole thing is so
> convoluted. E.g. I have hard time to see why ACPI hotplug is working
> correctly. My trail got lost at acpi_memory_device_add level which is
> a callback while acpi_device_hotplug is holding lock_device_hotplug but
> then again the rest is hidden by callbacks.

Commit 0f1cfe9d0d06 (mm/hotplug: remove stop_machine() from try_offline_node()) said:

   ---
     lock_device_hotplug() serializes hotplug & online/offline operations.  The
     lock is held in common sysfs online/offline interfaces and ACPI hotplug
     code paths.

     And here are the code paths:

     - CPU & Mem online/offline via sysfs online
         store_online()->lock_device_hotplug()

     - Mem online via sysfs state:
         store_mem_state()->lock_device_hotplug()

     - ACPI CPU & Mem hot-add:
         acpi_scan_bus_device_check()->lock_device_hotplug()

     - ACPI CPU & Mem hot-delete:
         acpi_scan_hot_remove()->lock_device_hotplug()
   ---

CPU & Memory online/offline/hotplug are serialized by lock_device_hotplug().

> I cannot seem to find any
> documentation which would explain all the locking here.

Yes. I need the documentation.

Thanks,
Yasuaki Ishimatsu

>
> So why other callers of add_memory don't need the same treatment and if
> they do then why don't we use the lock at add_memory level?
>
>> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
>
> Nak to this without proper explanation and I really think that it should
> be the onlining code which should deal with the parallel add_memory and
> back off until the full initialization is done.
>
>> ---
>>   drivers/hv/hv_balloon.c |    4 ++++
>>   1 files changed, 4 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
>> index afdb0d5..f525a62 100644
>> --- a/drivers/hv/hv_balloon.c
>> +++ b/drivers/hv/hv_balloon.c
>> @@ -22,6 +22,7 @@
>>   #include <linux/jiffies.h>
>>   #include <linux/mman.h>
>>   #include <linux/delay.h>
>> +#include <linux/device.h>
>>   #include <linux/init.h>
>>   #include <linux/module.h>
>>   #include <linux/slab.h>
>> @@ -649,8 +650,11 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
>>
>>   		release_region_mutex(false);
>>   		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
>> +
>> +		lock_device_hotplug();
>>   		ret = add_memory(nid, PFN_PHYS((start_pfn)),
>>   				(HA_CHUNK << PAGE_SHIFT));
>> +		unlock_device_hotplug();
>>
>>   		if (ret) {
>>   			pr_info("hot_add memory failed error is %d\n", ret);
>> --
>> 1.7.4.1
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] Drivers: hv: balloon: Fix the deadlock issue in the memory hot-add code
  2014-12-09  1:23       ` Yasuaki Ishimatsu
@ 2014-12-09  9:08         ` Michal Hocko
  2014-12-09 10:25           ` Yasuaki Ishimatsu
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2014-12-09  9:08 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: K. Y. Srinivasan, gregkh, linux-kernel, devel, olaf, apw, linux-mm

On Tue 09-12-14 10:23:51, Yasuaki Ishimatsu wrote:
> (2014/12/09 0:04), Michal Hocko wrote:
> >On Fri 05-12-14 16:41:38, K. Y. Srinivasan wrote:
> >>Andy Whitcroft <apw@canonical.com> initially saw this deadlock. We
> >>have seen this as well. Here is the original description of the
> >>problem (and a potential solution) from Andy:
> >>
> >>https://lkml.org/lkml/2014/3/14/451
> >>
> >>Here is an excerpt from that mail:
> >>
> >>"We are seeing machines lockup with what appears to be an ABBA
> >>deadlock in the memory hotplug system.  These are from the 3.13.6 based Ubuntu kernels.
> >>The hv_balloon driver is adding memory using add_memory() which takes
> >>the hotplug lock
> >
> >Do you mean mem_hotplug_begin?
> >
> 
> >>and then emits a udev event, and then attempts to
> >>lock the sysfs device.  In response to the udev event udev opens the
> >>sysfs device and locks it, then attempts to grab the hotplug lock to online the memory.
> >
> >Cannot we simply teach online_pages to fail with EBUSY when the memory
> >hotplug is on the way.  We shouldn't try to online something that is not
> >initialized yet, no?
> 
> Yes. Memory online shouldn't try before initializing it. Then memory online
> should wait for initializing it, not easily fails. Generally, kernel sends
> memory ONLINE event to userland by kobject_uevent() during initializing memory
> and udev makes memory online after catching the event. Onlining memory by
> udev almost run during initializing memory.

I guess this is because the event is sent after a mem section is
initialized while the overal hotplug operation is still not completed.

> So if memory online easily fails, onlining memory by udev almost
> fails.

Doesn't udev retry the operation if it gets EBUSY or EAGAIN?

> >The memory hotplug log is global so we can get
> >false positives but that should be easier to deal with than exporting
> >lock_device_hotplug and adding yet another lock dependency.
> >
> >>This seems to be inverted nesting in the two cases, leading to the hangs below:
> >>
> >>[  240.608612] INFO: task kworker/0:2:861 blocked for more than 120 seconds.
> >>[  240.608705] INFO: task systemd-udevd:1906 blocked for more than 120 seconds.
> >>
> >>I note that the device hotplug locking allows complete retries (via
> >>ERESTARTSYS) and if we could detect this at the online stage it could
> >>be used to get us out.
> >
> >I am not sure I understand this but it suggests EBUSY above?
> >
> >>But before I go down this road I wanted to
> >>make sure I am reading this right.  Or indeed if the hv_balloon driver
> >>is just doing this wrong."
> >>
> >>This patch is based on the suggestion from
> >>Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> >
> >This changelog doesn't explain us much. And boy this whole thing is so
> >convoluted. E.g. I have hard time to see why ACPI hotplug is working
> >correctly. My trail got lost at acpi_memory_device_add level which is
> >a callback while acpi_device_hotplug is holding lock_device_hotplug but
> >then again the rest is hidden by callbacks.
> 
> Commit 0f1cfe9d0d06 (mm/hotplug: remove stop_machine() from try_offline_node()) said:
> 
>   ---
>     lock_device_hotplug() serializes hotplug & online/offline operations.  The
>     lock is held in common sysfs online/offline interfaces and ACPI hotplug
>     code paths.
> 
>     And here are the code paths:
> 
>     - CPU & Mem online/offline via sysfs online
>         store_online()->lock_device_hotplug()
> 
>     - Mem online via sysfs state:
>         store_mem_state()->lock_device_hotplug()
> 
>     - ACPI CPU & Mem hot-add:
>         acpi_scan_bus_device_check()->lock_device_hotplug()
> 
>     - ACPI CPU & Mem hot-delete:
>         acpi_scan_hot_remove()->lock_device_hotplug()
>   ---
> 
> CPU & Memory online/offline/hotplug are serialized by lock_device_hotplug().

OK, this patch aimed at the complete nodes hotplug. I am not familiar
with the code enough to tell whether this is really needed but it sounds
like an overkill when we are interested only in the memory hotplug. Why
would we need stop_machine or anything for memory that is guaranteed to
be not used at the time of both online and offline.
And again, why cannot we simply make the onlining fail or try_lock and
retry internally if the event consumer cannot cope with errors?
And even if that is not possible then do not export lock_device_hotplug
but export a memory hotplug functions which use it properly so that
other consumers (xen ballon seem to rely on add_memory as well) do not
need the same change as well.

> >I cannot seem to find any documentation which would explain all the
> >locking here.
> 
> Yes. I need the documentation.

Yes please! This code is incredibly hard to follow and deduce all the
hidden requirements and dependencies is even harder.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] Drivers: hv: balloon: Fix the deadlock issue in the memory hot-add code
  2014-12-09  9:08         ` Michal Hocko
@ 2014-12-09 10:25           ` Yasuaki Ishimatsu
  2014-12-09 10:55             ` Michal Hocko
  0 siblings, 1 reply; 12+ messages in thread
From: Yasuaki Ishimatsu @ 2014-12-09 10:25 UTC (permalink / raw)
  To: Michal Hocko
  Cc: K. Y. Srinivasan, gregkh, linux-kernel, devel, olaf, apw, linux-mm

(2014/12/09 18:08), Michal Hocko wrote:
> On Tue 09-12-14 10:23:51, Yasuaki Ishimatsu wrote:
>> (2014/12/09 0:04), Michal Hocko wrote:
>>> On Fri 05-12-14 16:41:38, K. Y. Srinivasan wrote:
>>>> Andy Whitcroft <apw@canonical.com> initially saw this deadlock. We
>>>> have seen this as well. Here is the original description of the
>>>> problem (and a potential solution) from Andy:
>>>>
>>>> https://lkml.org/lkml/2014/3/14/451
>>>>
>>>> Here is an excerpt from that mail:
>>>>
>>>> "We are seeing machines lockup with what appears to be an ABBA
>>>> deadlock in the memory hotplug system.  These are from the 3.13.6 based Ubuntu kernels.
>>>> The hv_balloon driver is adding memory using add_memory() which takes
>>>> the hotplug lock
>>>
>>> Do you mean mem_hotplug_begin?
>>>
>>
>>>> and then emits a udev event, and then attempts to
>>>> lock the sysfs device.  In response to the udev event udev opens the
>>>> sysfs device and locks it, then attempts to grab the hotplug lock to online the memory.
>>>
>>> Cannot we simply teach online_pages to fail with EBUSY when the memory
>>> hotplug is on the way.  We shouldn't try to online something that is not
>>> initialized yet, no?
>>
>> Yes. Memory online shouldn't try before initializing it. Then memory online
>> should wait for initializing it, not easily fails. Generally, kernel sends
>> memory ONLINE event to userland by kobject_uevent() during initializing memory
>> and udev makes memory online after catching the event. Onlining memory by
>> udev almost run during initializing memory.
>
> I guess this is because the event is sent after a mem section is
> initialized while the overal hotplug operation is still not completed.

right.

>
>> So if memory online easily fails, onlining memory by udev almost
>> fails.
>
> Doesn't udev retry the operation if it gets EBUSY or EAGAIN?

It depend on implementation of udev.rules. So we can retry online/offline
operation in udev.rules.

>
>>> The memory hotplug log is global so we can get
>>> false positives but that should be easier to deal with than exporting
>>> lock_device_hotplug and adding yet another lock dependency.
>>>
>>>> This seems to be inverted nesting in the two cases, leading to the hangs below:
>>>>
>>>> [  240.608612] INFO: task kworker/0:2:861 blocked for more than 120 seconds.
>>>> [  240.608705] INFO: task systemd-udevd:1906 blocked for more than 120 seconds.
>>>>
>>>> I note that the device hotplug locking allows complete retries (via
>>>> ERESTARTSYS) and if we could detect this at the online stage it could
>>>> be used to get us out.
>>>
>>> I am not sure I understand this but it suggests EBUSY above?
>>>
>>>> But before I go down this road I wanted to
>>>> make sure I am reading this right.  Or indeed if the hv_balloon driver
>>>> is just doing this wrong."
>>>>
>>>> This patch is based on the suggestion from
>>>> Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>>
>>> This changelog doesn't explain us much. And boy this whole thing is so
>>> convoluted. E.g. I have hard time to see why ACPI hotplug is working
>>> correctly. My trail got lost at acpi_memory_device_add level which is
>>> a callback while acpi_device_hotplug is holding lock_device_hotplug but
>>> then again the rest is hidden by callbacks.
>>
>> Commit 0f1cfe9d0d06 (mm/hotplug: remove stop_machine() from try_offline_node()) said:
>>
>>    ---
>>      lock_device_hotplug() serializes hotplug & online/offline operations.  The
>>      lock is held in common sysfs online/offline interfaces and ACPI hotplug
>>      code paths.
>>
>>      And here are the code paths:
>>
>>      - CPU & Mem online/offline via sysfs online
>>          store_online()->lock_device_hotplug()
>>
>>      - Mem online via sysfs state:
>>          store_mem_state()->lock_device_hotplug()
>>
>>      - ACPI CPU & Mem hot-add:
>>          acpi_scan_bus_device_check()->lock_device_hotplug()
>>
>>      - ACPI CPU & Mem hot-delete:
>>          acpi_scan_hot_remove()->lock_device_hotplug()
>>    ---
>>
>> CPU & Memory online/offline/hotplug are serialized by lock_device_hotplug().
>
> OK, this patch aimed at the complete nodes hotplug. I am not familiar
> with the code enough to tell whether this is really needed but it sounds
> like an overkill when we are interested only in the memory hotplug. Why
> would we need stop_machine or anything for memory that is guaranteed to
> be not used at the time of both online and offline.

As Srinivasan mentioned, there is ABBA issue. So to evade it, lock_device_hotplug()
is needed.

> And again, why cannot we simply make the onlining fail or try_lock and
> retry internally if the event consumer cannot cope with errors?

Did you mean the following Srinivasan's first patch looks good to you?
   https://lkml.org/lkml/2014/12/2/662

Thanks,
Yasuaki Ishimatsu

> And even if that is not possible then do not export lock_device_hotplug
> but export a memory hotplug functions which use it properly so that
> other consumers (xen ballon seem to rely on add_memory as well) do not
> need the same change as well.


>
>>> I cannot seem to find any documentation which would explain all the
>>> locking here.
>>
>> Yes. I need the documentation.
>
> Yes please! This code is incredibly hard to follow and deduce all the
> hidden requirements and dependencies is even harder.
>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] Drivers: hv: balloon: Fix the deadlock issue in the memory hot-add code
  2014-12-09 10:25           ` Yasuaki Ishimatsu
@ 2014-12-09 10:55             ` Michal Hocko
  2014-12-11  0:21               ` KY Srinivasan
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2014-12-09 10:55 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: K. Y. Srinivasan, gregkh, linux-kernel, devel, olaf, apw, linux-mm

On Tue 09-12-14 19:25:50, Yasuaki Ishimatsu wrote:
> (2014/12/09 18:08), Michal Hocko wrote:
[...]
> >Doesn't udev retry the operation if it gets EBUSY or EAGAIN?
> 
> It depend on implementation of udev.rules. So we can retry online/offline
> operation in udev.rules.
[...]

# Memory hotadd request
SUBSYSTEM=="memory", ACTION=="add", DEVPATH=="/devices/system/memory/memory*[0-9]", TEST=="/sys$devpath/state", RUN+="/bin/sh -c 'echo online > /sys$devpath/state'"

OK so this is not prepared for a temporary failures and retries.

> >And again, why cannot we simply make the onlining fail or try_lock and
> >retry internally if the event consumer cannot cope with errors?
> 
> Did you mean the following Srinivasan's first patch looks good to you?
>   https://lkml.org/lkml/2014/12/2/662

Heh, I was just about to post this. Because I haven't noticed the
previous patch yet. Yeah, Something like that. Except that I would
expect EAGAIN or EBUSY rather than ERESTARTSYS which should never leak
into userspace. And that would happen here AFAICS because signal_pending
will not be true usually.

So there are two options. Either make the udev rule more robust and
retry within RUN section or do the retry withing online_pages (try_lock
and go into interruptible sleep which gets signaled by finished
add_memory()). The later option is safer wrt. the userspace because the
operation wouldn't fail unexpectedly.
Another option would be generating the sysfs file after all the internal
initialization is done and call it outside of the memory hotplug lock.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 2/2] Drivers: hv: balloon: Fix the deadlock issue in the memory hot-add code
  2014-12-09 10:55             ` Michal Hocko
@ 2014-12-11  0:21               ` KY Srinivasan
  2014-12-11 12:58                 ` Michal Hocko
  0 siblings, 1 reply; 12+ messages in thread
From: KY Srinivasan @ 2014-12-11  0:21 UTC (permalink / raw)
  To: Michal Hocko, Yasuaki Ishimatsu
  Cc: gregkh, linux-kernel, devel, olaf, apw, linux-mm



> -----Original Message-----
> From: Michal Hocko [mailto:mhocko@suse.cz]
> Sent: Tuesday, December 9, 2014 2:56 AM
> To: Yasuaki Ishimatsu
> Cc: KY Srinivasan; gregkh@linuxfoundation.org; linux-
> kernel@vger.kernel.org; devel@linuxdriverproject.org; olaf@aepfle.de;
> apw@canonical.com; linux-mm@kvack.org
> Subject: Re: [PATCH 2/2] Drivers: hv: balloon: Fix the deadlock issue in the
> memory hot-add code
> 
> On Tue 09-12-14 19:25:50, Yasuaki Ishimatsu wrote:
> > (2014/12/09 18:08), Michal Hocko wrote:
> [...]
> > >Doesn't udev retry the operation if it gets EBUSY or EAGAIN?
> >
> > It depend on implementation of udev.rules. So we can retry
> > online/offline operation in udev.rules.
> [...]
> 
> # Memory hotadd request
> SUBSYSTEM=="memory", ACTION=="add",
> DEVPATH=="/devices/system/memory/memory*[0-9]",
> TEST=="/sys$devpath/state", RUN+="/bin/sh -c 'echo online >
> /sys$devpath/state'"
> 
> OK so this is not prepared for a temporary failures and retries.
> 
> > >And again, why cannot we simply make the onlining fail or try_lock
> > >and retry internally if the event consumer cannot cope with errors?
> >
> > Did you mean the following Srinivasan's first patch looks good to you?
> >   https://lkml.org/lkml/2014/12/2/662
> 
> Heh, I was just about to post this. Because I haven't noticed the previous
> patch yet. Yeah, Something like that. Except that I would expect EAGAIN or
> EBUSY rather than ERESTARTSYS which should never leak into userspace. And
> that would happen here AFAICS because signal_pending will not be true
> usually.
Michal,

I agree that the fix to this problem must be outside the clients of  add_memory() and that
is the reason I had sent that patch:  https://lkml.org/lkml/2014/12/2/662. Let me know if
you want me to resend this patch with the correct return value.

Regards,

K. Y
> 
> So there are two options. Either make the udev rule more robust and retry
> within RUN section or do the retry withing online_pages (try_lock and go into
> interruptible sleep which gets signaled by finished add_memory()). The later
> option is safer wrt. the userspace because the operation wouldn't fail
> unexpectedly.
> Another option would be generating the sysfs file after all the internal
> initialization is done and call it outside of the memory hotplug lock.
> 
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] Drivers: hv: balloon: Fix the deadlock issue in the memory hot-add code
  2014-12-11  0:21               ` KY Srinivasan
@ 2014-12-11 12:58                 ` Michal Hocko
  2014-12-14 18:45                   ` KY Srinivasan
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2014-12-11 12:58 UTC (permalink / raw)
  To: KY Srinivasan
  Cc: Yasuaki Ishimatsu, gregkh, linux-kernel, devel, olaf, apw, linux-mm

On Thu 11-12-14 00:21:09, KY Srinivasan wrote:
> 
> 
> > -----Original Message-----
> > From: Michal Hocko [mailto:mhocko@suse.cz]
> > Sent: Tuesday, December 9, 2014 2:56 AM
> > To: Yasuaki Ishimatsu
> > Cc: KY Srinivasan; gregkh@linuxfoundation.org; linux-
> > kernel@vger.kernel.org; devel@linuxdriverproject.org; olaf@aepfle.de;
> > apw@canonical.com; linux-mm@kvack.org
> > Subject: Re: [PATCH 2/2] Drivers: hv: balloon: Fix the deadlock issue in the
> > memory hot-add code
> > 
> > On Tue 09-12-14 19:25:50, Yasuaki Ishimatsu wrote:
> > > (2014/12/09 18:08), Michal Hocko wrote:
> > [...]
> > > >Doesn't udev retry the operation if it gets EBUSY or EAGAIN?
> > >
> > > It depend on implementation of udev.rules. So we can retry
> > > online/offline operation in udev.rules.
> > [...]
> > 
> > # Memory hotadd request
> > SUBSYSTEM=="memory", ACTION=="add",
> > DEVPATH=="/devices/system/memory/memory*[0-9]",
> > TEST=="/sys$devpath/state", RUN+="/bin/sh -c 'echo online >
> > /sys$devpath/state'"
> > 
> > OK so this is not prepared for a temporary failures and retries.
> > 
> > > >And again, why cannot we simply make the onlining fail or try_lock
> > > >and retry internally if the event consumer cannot cope with errors?
> > >
> > > Did you mean the following Srinivasan's first patch looks good to you?
> > >   https://lkml.org/lkml/2014/12/2/662
> > 
> > Heh, I was just about to post this. Because I haven't noticed the previous
> > patch yet. Yeah, Something like that. Except that I would expect EAGAIN or
> > EBUSY rather than ERESTARTSYS which should never leak into userspace. And
> > that would happen here AFAICS because signal_pending will not be true
> > usually.
> Michal,
> 
> I agree that the fix to this problem must be outside the clients
> of add_memory() and that is the reason I had sent that patch:
> https://lkml.org/lkml/2014/12/2/662. Let me know if you want me to
> resend this patch with the correct return value.

Please think about the other suggested options as well.
 
> Regards,
> 
> K. Y
> > 
> > So there are two options. Either make the udev rule more robust and retry
> > within RUN section or do the retry withing online_pages (try_lock and go into
> > interruptible sleep which gets signaled by finished add_memory()). The later
> > option is safer wrt. the userspace because the operation wouldn't fail
> > unexpectedly.
> > Another option would be generating the sysfs file after all the internal
> > initialization is done and call it outside of the memory hotplug lock.
> > 
> > --
> > Michal Hocko
> > SUSE Labs

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 2/2] Drivers: hv: balloon: Fix the deadlock issue in the memory hot-add code
  2014-12-11 12:58                 ` Michal Hocko
@ 2014-12-14 18:45                   ` KY Srinivasan
  0 siblings, 0 replies; 12+ messages in thread
From: KY Srinivasan @ 2014-12-14 18:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Yasuaki Ishimatsu, gregkh, linux-kernel, devel, olaf, apw, linux-mm



> -----Original Message-----
> From: Michal Hocko [mailto:mhocko@suse.cz]
> Sent: Thursday, December 11, 2014 4:58 AM
> To: KY Srinivasan
> Cc: Yasuaki Ishimatsu; gregkh@linuxfoundation.org; linux-
> kernel@vger.kernel.org; devel@linuxdriverproject.org; olaf@aepfle.de;
> apw@canonical.com; linux-mm@kvack.org
> Subject: Re: [PATCH 2/2] Drivers: hv: balloon: Fix the deadlock issue in the
> memory hot-add code
> 
> On Thu 11-12-14 00:21:09, KY Srinivasan wrote:
> >
> >
> > > -----Original Message-----
> > > From: Michal Hocko [mailto:mhocko@suse.cz]
> > > Sent: Tuesday, December 9, 2014 2:56 AM
> > > To: Yasuaki Ishimatsu
> > > Cc: KY Srinivasan; gregkh@linuxfoundation.org; linux-
> > > kernel@vger.kernel.org; devel@linuxdriverproject.org;
> > > olaf@aepfle.de; apw@canonical.com; linux-mm@kvack.org
> > > Subject: Re: [PATCH 2/2] Drivers: hv: balloon: Fix the deadlock
> > > issue in the memory hot-add code
> > >
> > > On Tue 09-12-14 19:25:50, Yasuaki Ishimatsu wrote:
> > > > (2014/12/09 18:08), Michal Hocko wrote:
> > > [...]
> > > > >Doesn't udev retry the operation if it gets EBUSY or EAGAIN?
> > > >
> > > > It depend on implementation of udev.rules. So we can retry
> > > > online/offline operation in udev.rules.
> > > [...]
> > >
> > > # Memory hotadd request
> > > SUBSYSTEM=="memory", ACTION=="add",
> > > DEVPATH=="/devices/system/memory/memory*[0-9]",
> > > TEST=="/sys$devpath/state", RUN+="/bin/sh -c 'echo online >
> > > /sys$devpath/state'"
> > >
> > > OK so this is not prepared for a temporary failures and retries.
> > >
> > > > >And again, why cannot we simply make the onlining fail or
> > > > >try_lock and retry internally if the event consumer cannot cope with
> errors?
> > > >
> > > > Did you mean the following Srinivasan's first patch looks good to you?
> > > >   https://lkml.org/lkml/2014/12/2/662
> > >
> > > Heh, I was just about to post this. Because I haven't noticed the
> > > previous patch yet. Yeah, Something like that. Except that I would
> > > expect EAGAIN or EBUSY rather than ERESTARTSYS which should never
> > > leak into userspace. And that would happen here AFAICS because
> > > signal_pending will not be true usually.
> > Michal,
> >
> > I agree that the fix to this problem must be outside the clients of
> > add_memory() and that is the reason I had sent that patch:
> > https://lkml.org/lkml/2014/12/2/662. Let me know if you want me to
> > resend this patch with the correct return value.
> 
> Please think about the other suggested options as well.

Thanks Michal. I will look at the other options you have listed as well.

K. Y
> 
> > Regards,
> >
> > K. Y
> > >
> > > So there are two options. Either make the udev rule more robust and
> > > retry within RUN section or do the retry withing online_pages
> > > (try_lock and go into interruptible sleep which gets signaled by
> > > finished add_memory()). The later option is safer wrt. the userspace
> > > because the operation wouldn't fail unexpectedly.
> > > Another option would be generating the sysfs file after all the
> > > internal initialization is done and call it outside of the memory hotplug
> lock.
> > >
> > > --
> > > Michal Hocko
> > > SUSE Labs
> 
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-12-14 18:45 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-06  0:41 [PATCH 0/2] Drivers: hv: hv_balloon: Fix a deadlock in the hot-add path K. Y. Srinivasan
2014-12-06  0:41 ` [PATCH 1/2] Drivers: base: core: Export functions to lock/unlock device hotplug lock K. Y. Srinivasan
2014-12-06  0:41   ` [PATCH 2/2] Drivers: hv: balloon: Fix the deadlock issue in the memory hot-add code K. Y. Srinivasan
2014-12-08 15:04     ` Michal Hocko
2014-12-09  1:23       ` Yasuaki Ishimatsu
2014-12-09  9:08         ` Michal Hocko
2014-12-09 10:25           ` Yasuaki Ishimatsu
2014-12-09 10:55             ` Michal Hocko
2014-12-11  0:21               ` KY Srinivasan
2014-12-11 12:58                 ` Michal Hocko
2014-12-14 18:45                   ` KY Srinivasan
2014-12-08  0:07 ` [PATCH 0/2] Drivers: hv: hv_balloon: Fix a deadlock in the hot-add path Yasuaki Ishimatsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).