LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [BISECTED REGRESSION v3.18->v3.19-rc1] drm/i915: failure to poweroff after hibernation @ 2015-02-24 14:49 Bjørn Mork 2015-02-24 14:58 ` [PATCH] drm/i915: fix failure to power off after hibernate Bjørn Mork 0 siblings, 1 reply; 14+ messages in thread From: Bjørn Mork @ 2015-02-24 14:49 UTC (permalink / raw) To: linux-kernel Cc: Daniel Vetter, Jani Nikula, intel-gfx, dri-devel, Imre Deak, Ville Syrjälä Hello, My Lenovo Thinkpad X301 has failed to power off after saving the hibernation image ever since v3.19-rc1. This is a regression since v3.18. The regression is still present i v4.0-rc1. The symptoms are: Hibernation proceeds as usual, writing a complete image. But instead of powering off the laptop stays on in a "dead" state, with fans running and the "sleep" LED blinking. The only way out of this state is by hard poweroff (pressing the power button for 4+ seconds). The system can successfully resume after this, proving that the hibernation was complete. I consider the bug somewhat critical as the system will continue to draw power until it is forcibly powered off, or the battery is completely dead. This causes unexpected battery usage and unnecessary battery wear if the power-off failure goes unnoticed. I finally took the time to bisect the problem, and ended up with this surprisingly obvious result: da2bc1b9db3351addd293e5b82757efe1f77ed1d is the first bad commit commit da2bc1b9db3351addd293e5b82757efe1f77ed1d Author: Imre Deak <imre.deak@intel.com> Date: Thu Oct 23 19:23:26 2014 +0300 drm/i915: add poweroff_late handler The suspend_late handler saves some registers and powers off the device, so it doesn't have a big overhead. Calling it at S4 poweroff_late time makes the power off handling identical to the S3 suspend and S4 freeze handling, so do this for consistency. Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> :040000 040000 367eee899c6c2b2a669e2e46f68529dad0e1f7a3 78c7571e2b18dc0fb77161b8a3e32288bd4cbee8 M drivers The bisect log was: # bad: [97bf6af1f928216fd6c5a66e8a57bfa95a659672] Linux 3.19-rc1 # good: [b2776bf7149bddd1f4161f14f79520f17fc1d71d] Linux 3.18 git bisect start 'v3.19-rc1' 'v3.18' # good: [70e71ca0af244f48a5dcf56dc435243792e3a495] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next git bisect good 70e71ca0af244f48a5dcf56dc435243792e3a495 # bad: [988adfdffdd43cfd841df734664727993076d7cb] Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux git bisect bad 988adfdffdd43cfd841df734664727993076d7cb # good: [e7cf773d431a63a2417902696fcc9e0ebdc83bbe] Merge tag 'usb-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb git bisect good e7cf773d431a63a2417902696fcc9e0ebdc83bbe # bad: [1a92b7a241dcf06a92d84219b4124dcf420ae315] Merge branch 'linux-3.19' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next git bisect bad 1a92b7a241dcf06a92d84219b4124dcf420ae315 # bad: [fd172d0c47fddff801d998e38c3efdd236ed082f] Merge tag 'drm-intel-next-2014-11-07-fixups' of git://anongit.freedesktop.org/drm-intel into drm-next git bisect bad fd172d0c47fddff801d998e38c3efdd236ed082f # bad: [820d2d77482810702758381808bdbb64595298e2] drm/i915/audio: pass intel_encoder on to platform specific ELD functions git bisect bad 820d2d77482810702758381808bdbb64595298e2 # good: [11c9b6c628c646894e6ef53f92cfd33a814ee553] drm/i915: Tighting frontbuffer tracking around flips git bisect good 11c9b6c628c646894e6ef53f92cfd33a814ee553 # good: [0b14cbd2f58199a024acbe2994bb27533c97d756] drm/i915: remove dead code from legacy suspend handler git bisect good 0b14cbd2f58199a024acbe2994bb27533c97d756 # good: [f4a12ead50580c17c3641ac1a453e68b5a5195dd] drm/i915: remove unused restore_gtt_mappings optimization during suspend git bisect good f4a12ead50580c17c3641ac1a453e68b5a5195dd # bad: [aff437667b93c3d65576b02628885687c72e1b3b] drm/i915: Move flags describing VMA mappings into the VMA git bisect bad aff437667b93c3d65576b02628885687c72e1b3b # bad: [da2bc1b9db3351addd293e5b82757efe1f77ed1d] drm/i915: add poweroff_late handler git bisect bad da2bc1b9db3351addd293e5b82757efe1f77ed1d # good: [f2476ae65e6159b41168bc41c630e9fbb1d72dde] drm/i915: disable/re-enable PCI device around S4 freeze/thaw git bisect good f2476ae65e6159b41168bc41c630e9fbb1d72dde # good: [5e365c391aeffe8b53d6952c28a68bd5fc856390] drm/i915: sanitize suspend/resume helper function names git bisect good 5e365c391aeffe8b53d6952c28a68bd5fc856390 My rather old system has a GM45 chip, but I would expect the bug to show up on most (all?) i915 systems: nemi:/tmp# lspci -vvvnns 2 00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller [8086:2a42] (rev 07) (prog-if 00 [VGA controller]) Subsystem: Lenovo Device [17aa:20e4] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 28 Region 0: Memory at f0000000 (64-bit, non-prefetchable) [size=4M] Region 2: Memory at d0000000 (64-bit, prefetchable) [size=256M] Region 4: I/O ports at 1800 [size=8] Expansion ROM at <unassigned> [disabled] Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Address: fee0100c Data: 41c2 Capabilities: [d0] Power Management version 3 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: i915 00:02.1 Display controller [0380]: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller [8086:2a43] (rev 07) Subsystem: Lenovo Device [17aa:20e4] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Region 0: Memory at f0400000 (64-bit, non-prefetchable) [size=1M] Capabilities: [d0] Power Management version 3 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- I'll follow up with a patch reverting the buggy commit. The patch has been successfully verified to fix the problem on top of v4.0-rc1, but should also go into the v3.19 stable series. Bjørn ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] drm/i915: fix failure to power off after hibernate 2015-02-24 14:49 [BISECTED REGRESSION v3.18->v3.19-rc1] drm/i915: failure to poweroff after hibernation Bjørn Mork @ 2015-02-24 14:58 ` Bjørn Mork 2015-02-24 16:12 ` Imre Deak 0 siblings, 1 reply; 14+ messages in thread From: Bjørn Mork @ 2015-02-24 14:58 UTC (permalink / raw) To: linux-kernel Cc: Daniel Vetter, Jani Nikula, intel-gfx, dri-devel, Bjørn Mork, stable, Imre Deak, Ville Syrjälä This fixes a bug where hibernation completes, but the system fails to power off after the image has been saved. Bisection lead to commit da2bc1b9db33 ("drm/i915: add poweroff_late handler") which added a .poweroff_late hook pointing to the same function as the .freeze_late hook, without any justification or explanation why this would be correct, except "for consistency". The assumption that this "makes the power off handling identical to the S3 suspend and S4 freeze handling" is completely bogus. As clearly documented in Documentation/power/devices.txt, the poweroff* hooks are part of the hibernation API along with the freeze* hooks. The phases involved in hibernation are: prepare, freeze, freeze_late, freeze_noirq, thaw_noirq, thaw_early, thaw, complete, prepare, poweroff, poweroff_late, poweroff_noirq With the above sequence in mind, it should be fairly obvious that you cannot save registers and disable the device in both the freeze_late and poweroff_late phases. Or as Documentation/power/devices.txt explains it: The poweroff, poweroff_late and poweroff_noirq callbacks should do essentially the same things as the suspend, suspend_late and suspend_noirq callbacks, respectively. The only notable difference is that they need not store the device register values, because the registers should already have been stored during the freeze, freeze_late or freeze_noirq phases. The "consistency" between suspend and hibernate was already in place, with freeze_late pointing to the same function as suspend_late. There is no need for any poweroff_late hook in this driver. This reverts commit da2bc1b9db33. Fixes: da2bc1b9db33 ("drm/i915: add poweroff_late handler") Cc: <stable@vger.kernel.org> Cc: Imre Deak <imre.deak@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> --- drivers/gpu/drm/i915/i915_drv.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 8039cec71fc2..9cd695710f93 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1520,7 +1520,6 @@ static const struct dev_pm_ops i915_pm_ops = { .thaw_early = i915_pm_resume_early, .thaw = i915_pm_resume, .poweroff = i915_pm_suspend, - .poweroff_late = i915_pm_suspend_late, .restore_early = i915_pm_resume_early, .restore = i915_pm_resume, -- 1.7.10.4 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/i915: fix failure to power off after hibernate 2015-02-24 14:58 ` [PATCH] drm/i915: fix failure to power off after hibernate Bjørn Mork @ 2015-02-24 16:12 ` Imre Deak 2015-02-24 19:00 ` Bjørn Mork 0 siblings, 1 reply; 14+ messages in thread From: Imre Deak @ 2015-02-24 16:12 UTC (permalink / raw) To: Bjørn Mork Cc: linux-kernel, Daniel Vetter, Jani Nikula, intel-gfx, dri-devel, stable, Ville Syrjälä On ti, 2015-02-24 at 15:58 +0100, Bjørn Mork wrote: > This fixes a bug where hibernation completes, but the system > fails to power off after the image has been saved. > > Bisection lead to commit da2bc1b9db33 ("drm/i915: add poweroff_late > handler") which added a .poweroff_late hook pointing to the same > function as the .freeze_late hook, without any justification or > explanation why this would be correct, except "for consistency". > > The assumption that this "makes the power off handling identical to > the S3 suspend and S4 freeze handling" is completely bogus. As clearly > documented in Documentation/power/devices.txt, the poweroff* hooks > are part of the hibernation API along with the freeze* hooks. The > phases involved in hibernation are: > > prepare, freeze, freeze_late, freeze_noirq, thaw_noirq, thaw_early, > thaw, complete, prepare, poweroff, poweroff_late, poweroff_noirq > > With the above sequence in mind, it should be fairly obvious that you > cannot save registers and disable the device in both the freeze_late > and poweroff_late phases. Or as Documentation/power/devices.txt > explains it: > > The poweroff, poweroff_late and poweroff_noirq callbacks should do essentially > the same things as the suspend, suspend_late and suspend_noirq callbacks, > respectively. The only notable difference is that they need not store the > device register values, because the registers should already have been stored > during the freeze, freeze_late or freeze_noirq phases. > > The "consistency" between suspend and hibernate was already in > place, with freeze_late pointing to the same function as suspend_late. > There is no need for any poweroff_late hook in this driver. The poweroff handlers undo the actions of the thaw handlers. As the original commit stated saving the registers is not needed there, but it's also not a big overhead and there should be no problem doing it. We are planning to optimize the hibernation sequence by for example not shutting down the display between freeze and thaw, and also getting rid of unnecessary steps at the power off phase. But before that we want to actually unify things rather than having special cases, as maintaining the special code paths caused already quite a lot of problems for us so far. Reverting the commit may hide some other issue, so before doing that could you try the following patch: http://lists.freedesktop.org/archives/intel-gfx/2015-February/060529.html If with that you still get the hang could you try on top of that the patch below, first having only pci_set_power_state uncommented, then both pci_set_power_state and pci_disable_device uncommented? Thanks, Imre diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 4badb23..dc91d4b 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -968,6 +968,23 @@ static int i915_pm_suspend_late(struct device *dev) return i915_drm_suspend_late(drm_dev); } +static int i915_pm_poweroff_late(struct device *dev) +{ + struct drm_device *drm_dev = dev_to_i915(dev)->dev; + struct drm_i915_private *dev_priv = drm_dev->dev_private; + int ret; + + ret = intel_suspend_complete(dev_priv); + + if (ret) + return ret; + + pci_disable_device(drm_dev->pdev); +// pci_set_power_state(drm_dev->pdev, PCI_D3hot); + + return 0; +} + static int i915_pm_resume_early(struct device *dev) { struct drm_device *drm_dev = dev_to_i915(dev)->dev; @@ -1535,7 +1552,7 @@ static const struct dev_pm_ops i915_pm_ops = { .thaw_early = i915_pm_resume_early, .thaw = i915_pm_resume, .poweroff = i915_pm_suspend, - .poweroff_late = i915_pm_suspend_late, + .poweroff_late = i915_pm_poweroff_late, .restore_early = i915_pm_resume_early, .restore = i915_pm_resume, ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/i915: fix failure to power off after hibernate 2015-02-24 16:12 ` Imre Deak @ 2015-02-24 19:00 ` Bjørn Mork 2015-02-25 18:33 ` Imre Deak 0 siblings, 1 reply; 14+ messages in thread From: Bjørn Mork @ 2015-02-24 19:00 UTC (permalink / raw) To: Imre Deak Cc: linux-kernel, Daniel Vetter, Jani Nikula, intel-gfx, dri-devel, stable, Ville Syrjälä Imre Deak <imre.deak@intel.com> writes: > The poweroff handlers undo the actions of the thaw handlers. As the > original commit stated saving the registers is not needed there, but > it's also not a big overhead and there should be no problem doing it. We > are planning to optimize the hibernation sequence by for example not > shutting down the display between freeze and thaw, and also getting rid > of unnecessary steps at the power off phase. But before that we want to > actually unify things rather than having special cases, as maintaining > the special code paths caused already quite a lot of problems for us so > far. That sounds like a worthy goal. I don't understand what you hope to achieve by having a poweroff_late hook, since there aren't really anything useful left to do at the point it is called, but if you want a dummy callback there for code structure reasons then fine. But you cannot just run around breaking stuff while slowly moving towards this goal over multiple releases... v3.19 is currently broken and that seems totally unnecessary. In any case: You should have noticed this problem while testing your patches. The breakage is 100% reproducible. Unfortunately I had to do a bisect to realize what you had done to the i915 driver, something I unfortunately didn't find time to do before v3.19 was released. But I do find it unnecessary to release with such bugs. Any attempt to exercise the code path you modified would have revealed the bug. > Reverting the commit may hide some other issue, so before doing that > could you try the following patch: > http://lists.freedesktop.org/archives/intel-gfx/2015-February/060529.html Makes no difference. I assume that patch fixes an unrelated bug? The age and reported symptoms indicates so. Note that I am reporting a regression introduced after v3.18, while that seems to fix a bug introduced in v3.17. Both v3.17 and v3.18 (including v3.18.6), as well as earlier releases, work fine for me. > If with that you still get the hang could you try on top of that the > patch below, first having only pci_set_power_state uncommented, then > both pci_set_power_state and pci_disable_device uncommented? That patch fixes the problem, with only pci_set_power_state commented out. Do you still want me to try with pci_disable_device() commented out as well? Bjørn ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/i915: fix failure to power off after hibernate 2015-02-24 19:00 ` Bjørn Mork @ 2015-02-25 18:33 ` Imre Deak 2015-02-26 9:34 ` Bjørn Mork 0 siblings, 1 reply; 14+ messages in thread From: Imre Deak @ 2015-02-25 18:33 UTC (permalink / raw) To: Bjørn Mork Cc: linux-kernel, Daniel Vetter, Jani Nikula, intel-gfx, dri-devel, stable, Ville Syrjälä [-- Attachment #1: Type: text/plain, Size: 4255 bytes --] On ti, 2015-02-24 at 20:00 +0100, Bjørn Mork wrote: > Imre Deak <imre.deak@intel.com> writes: > > > The poweroff handlers undo the actions of the thaw handlers. As the > > original commit stated saving the registers is not needed there, but > > it's also not a big overhead and there should be no problem doing it. We > > are planning to optimize the hibernation sequence by for example not > > shutting down the display between freeze and thaw, and also getting rid > > of unnecessary steps at the power off phase. But before that we want to > > actually unify things rather than having special cases, as maintaining > > the special code paths caused already quite a lot of problems for us so > > far. > > That sounds like a worthy goal. I don't understand what you hope to > achieve by having a poweroff_late hook, since there aren't really > anything useful left to do at the point it is called, but if you want a > dummy callback there for code structure reasons then fine. There are the following reasons for shutting down the device properly during hibernation: - ACPI mandates that the OSPM (the kernel in our case) puts all devices into D3 that are not wake-up sources (i915 is not) (Kudos to Ville for pointing this out) - Embedded panels have a well defined shutdown sequence. We don't have any good reason to not follow this, in fact for some panels the subsequent reinitialization could be problematic in case of a hard power-off. (Thanks to Jani for this info) In fact the only reason why we didn't put the device into PCI D3 before the patch you bisected, is kind of arbitrary. The PCI core puts every device into D3 unless its driver saves the device's PCI state on its own. Since before that patch we did save the state, but haven't put the device into D3 it stayed in D0. That was definitely not intentional and as such we depended on the BIOS to correct this for us afterwards. > But you cannot just run around breaking stuff while slowly moving > towards this goal over multiple releases... v3.19 is currently broken > and that seems totally unnecessary. > > In any case: You should have noticed this problem while testing your > patches. The breakage is 100% reproducible. Unfortunately I had to do a > bisect to realize what you had done to the i915 driver, something I > unfortunately didn't find time to do before v3.19 was released. But I > do find it unnecessary to release with such bugs. Any attempt to > exercise the code path you modified would have revealed the bug. We tested these patches on numerous platforms and haven't seen the issue you reported. Based on your feedback the current assumption is that this is a bug in your BIOS, which tries to access the device despite it being powered down. We're trying our best to test each change we commit on a big set of platforms, but - especially on older hardware with random BIOS versions/configurations - full coverage is not possible. So we depend on reports like yours a lot to fill in the gaps. > > Reverting the commit may hide some other issue, so before doing that > > could you try the following patch: > > http://lists.freedesktop.org/archives/intel-gfx/2015-February/060529.html > > Makes no difference. I assume that patch fixes an unrelated bug? The > age and reported symptoms indicates so. Note that I am reporting a > regression introduced after v3.18, while that seems to fix a bug > introduced in v3.17. Both v3.17 and v3.18 (including v3.18.6), as > well as earlier releases, work fine for me. Ok, thanks for trying. > > If with that you still get the hang could you try on top of that the > > patch below, first having only pci_set_power_state uncommented, then > > both pci_set_power_state and pci_disable_device uncommented? > > That patch fixes the problem, with only pci_set_power_state commented > out. Do you still want me to try with pci_disable_device() commented > out as well? No, but it would help if you could still try the two attached patch separately, without any of the previous workarounds. Based on the result, we'll follow up with a fix that adds for your specific platform either of the attached workarounds or simply avoids putting the device into D3 (corresponding to the patch you already tried). Thanks, Imre [-- Attachment #2: 0001-drm-i915-zero-PCI_COMMAND-at-the-end-of-hibernation.patch --] [-- Type: text/x-patch, Size: 783 bytes --] From fe8b90c976f14eab80cb6715d0405157163d316e Mon Sep 17 00:00:00 2001 From: Imre Deak <imre.deak@intel.com> Date: Wed, 25 Feb 2015 19:38:53 +0200 Subject: [PATCH] drm/i915: zero PCI_COMMAND at the end of hibernation Signed-off-by: Imre Deak <imre.deak@intel.com> --- drivers/gpu/drm/i915/i915_drv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 4badb23..b226cc6 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -653,6 +653,8 @@ static int i915_drm_suspend_late(struct drm_device *drm_dev) pci_disable_device(drm_dev->pdev); pci_set_power_state(drm_dev->pdev, PCI_D3hot); + pci_write_config_word(drm_dev->pdev, PCI_COMMAND, 0); + return 0; } -- 2.1.0 [-- Attachment #3: 0002-drm-i915-use-D3cold-instead-of-D3hot-during-suspend-.patch --] [-- Type: text/x-patch, Size: 804 bytes --] From 18310629cbd32327edd8bc30969e23a7b20a3ce3 Mon Sep 17 00:00:00 2001 From: Imre Deak <imre.deak@intel.com> Date: Wed, 25 Feb 2015 19:43:33 +0200 Subject: [PATCH] drm/i915: use D3cold instead of D3hot during suspend/hibernate Signed-off-by: Imre Deak <imre.deak@intel.com> --- drivers/gpu/drm/i915/i915_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 4badb23..807443a 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -651,7 +651,7 @@ static int i915_drm_suspend_late(struct drm_device *drm_dev) } pci_disable_device(drm_dev->pdev); - pci_set_power_state(drm_dev->pdev, PCI_D3hot); + pci_set_power_state(drm_dev->pdev, PCI_D3cold); return 0; } -- 2.1.0 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/i915: fix failure to power off after hibernate 2015-02-25 18:33 ` Imre Deak @ 2015-02-26 9:34 ` Bjørn Mork 2015-02-26 18:50 ` Imre Deak 0 siblings, 1 reply; 14+ messages in thread From: Bjørn Mork @ 2015-02-26 9:34 UTC (permalink / raw) To: Imre Deak Cc: linux-kernel, Daniel Vetter, Jani Nikula, intel-gfx, dri-devel, stable, Ville Syrjälä Imre Deak <imre.deak@intel.com> writes: >> That patch fixes the problem, with only pci_set_power_state commented >> out. Do you still want me to try with pci_disable_device() commented >> out as well? > > No, but it would help if you could still try the two attached patch > separately, without any of the previous workarounds. Based on the > result, we'll follow up with a fix that adds for your specific platform > either of the attached workarounds or simply avoids putting the device > into D3 (corresponding to the patch you already tried). None of those patches made any difference. The laptop still hangs at power-off. Not really surprising, is it? Previous testing shows that the hang occurs at the pci_set_power_state(drm_dev->pdev, PCI_D3hot) call in the poweroff_late hook. It is hard to see how replacing it by an attempt to set D3cold, or adding any call after this point, could possibly change anything. The system is stil hanging at the pci_set_power_state() call. The generic pci-driver code will put the i915 device into PCI_D3hot for you, won't it? Why do you need to duplicate that in the driver, *knowing* that doing so breaks (at least some) systems? I honestly don't think this "let's try some random code" is the proper way to fix this bug (or any other bug for that matter). You need to start understanding the code you write, and the first step is by actually explaining the changes you make. I also believe that you completely miss the fact that this bug has survived a full release cycle before you became aware of it, and the implications this has wrt other affected systems/users. I assume you understand that my system isn't one-of-a-kind, This means that there are other affected users with identical/similar systems. Now, if none of those users reported the bug to you (we all know why: Linux kernel development is currently limited by the available testing resources, NOT by the available developer resources), then how do you know that there aren't a number of other systems affected as well? Let me answer that for you: You don't. Which is why you must explain the mechanism triggering the bug, proving that it is a chipset/system specific issue. Because that's the only way you will *know* that you have solved the problem not only for me, but for all affected users. IMHO, the only safe and sane solution at the moment is the revert patch I posted. It's a simple fix, reverting back to the *known* working state before this regression was introduced. Then you can start over from there, trying to implement this properly. Thanks, Bjørn ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/i915: fix failure to power off after hibernate 2015-02-26 9:34 ` Bjørn Mork @ 2015-02-26 18:50 ` Imre Deak 2015-02-26 19:20 ` Bjørn Mork ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Imre Deak @ 2015-02-26 18:50 UTC (permalink / raw) To: Bjørn Mork Cc: linux-kernel, Daniel Vetter, Jani Nikula, intel-gfx, dri-devel, stable, Ville Syrjälä [-- Attachment #1: Type: text/plain, Size: 3538 bytes --] On to, 2015-02-26 at 10:34 +0100, Bjørn Mork wrote: > Imre Deak <imre.deak@intel.com> writes: > > >> That patch fixes the problem, with only pci_set_power_state commented > >> out. Do you still want me to try with pci_disable_device() commented > >> out as well? > > > > No, but it would help if you could still try the two attached patch > > separately, without any of the previous workarounds. Based on the > > result, we'll follow up with a fix that adds for your specific platform > > either of the attached workarounds or simply avoids putting the device > > into D3 (corresponding to the patch you already tried). > > None of those patches made any difference. The laptop still hangs at > power-off. > > Not really surprising, is it? Previous testing shows that the hang > occurs at the pci_set_power_state(drm_dev->pdev, PCI_D3hot) call in the > poweroff_late hook. It is hard to see how replacing it by an attempt to > set D3cold, or adding any call after this point, could possibly change > anything. The system is stil hanging at the pci_set_power_state() call. Judging from the blinking LED, I assume that it's not pci_set_power_state() that hangs the machine, but the hang happens in BIOS code. > The generic pci-driver code will put the i915 device into PCI_D3hot for > you, won't it? Why do you need to duplicate that in the driver, > *knowing* that doing so breaks (at least some) systems? Letting the pci core put the device into D3 wouldn't get rid of the problem. It's putting the device into D3 in the first place what causes it. > I honestly don't think this "let's try some random code" is the proper > way to fix this bug (or any other bug for that matter). You need to > start understanding the code you write, and the first step is by > actually explaining the changes you make. We have a good understanding about the issue: the BIOS on your platform does something unexpected behind the back of the driver/kernel. In that sense the patches were not random, for instance the first one is based on an existing workaround for an issue that resembles quite a lot yours, see pci_pm_poweroff_noirq(). > I also believe that you completely miss the fact that this bug has > survived a full release cycle before you became aware of it, and the > implications this has wrt other affected systems/users. I assume you > understand that my system isn't one-of-a-kind, This means that there are > other affected users with identical/similar systems. Now, if none of > those users reported the bug to you (we all know why: Linux kernel > development is currently limited by the available testing resources, NOT > by the available developer resources), then how do you know that there > aren't a number of other systems affected as well? > > Let me answer that for you: You don't. > > Which is why you must explain the mechanism triggering the bug, proving > that it is a chipset/system specific issue. Because that's the only way > you will *know* that you have solved the problem not only for me, but for > all affected users. > > IMHO, the only safe and sane solution at the moment is the revert patch > I posted. It's a simple fix, reverting back to the *known* working > state before this regression was introduced. > > Then you can start over from there, trying to implement this properly. The current way is the proper one that we want for the generic case. The issue on your platform is the exception, so working around that is a sensible choice. Attached is the proposed fix for this issue. --Imre [-- Attachment #2: 0001-drm-i915-gm45-work-around-hang-during-hibernation.patch --] [-- Type: text/x-patch, Size: 3625 bytes --] From 5c23657bc168db12a1ba100ab65fabd305c89c8a Mon Sep 17 00:00:00 2001 From: Imre Deak <imre.deak@intel.com> Date: Thu, 26 Feb 2015 18:38:53 +0200 Subject: [PATCH] drm/i915: gm45: work around hang during hibernation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Bjørn reported that his machine hang during hibernation and eventually bisected the problem to the following commit: commit da2bc1b9db3351addd293e5b82757efe1f77ed1d Author: Imre Deak <imre.deak@intel.com> Date: Thu Oct 23 19:23:26 2014 +0300 drm/i915: add poweroff_late handler The problem seems to be that after the kernel puts the device into D3 the BIOS still tries to access it, or otherwise assumes that it's in D0. This is clearly bogus, since ACPI mandates that devices are put into D3 by the OSPM if they are not wake-up sources. In the future we want to unify more of the driver's runtime and system suspend paths, for example by skipping all the system suspend/hibernation hooks if the device is runtime suspended already. Accordingly for all other platforms the goal is still to properly power down the device during hibernation. Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-February/060633.html Reported-and-bisected-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Imre Deak <imre.deak@intel.com> --- drivers/gpu/drm/i915/i915_drv.c | 27 ++++++++++++++++++++++----- 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 4badb23..67d212b 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -637,7 +637,7 @@ static int i915_drm_suspend(struct drm_device *dev) return 0; } -static int i915_drm_suspend_late(struct drm_device *drm_dev) +static int i915_drm_suspend_late(struct drm_device *drm_dev, bool hibernation) { struct drm_i915_private *dev_priv = drm_dev->dev_private; int ret; @@ -651,7 +651,14 @@ static int i915_drm_suspend_late(struct drm_device *drm_dev) } pci_disable_device(drm_dev->pdev); - pci_set_power_state(drm_dev->pdev, PCI_D3hot); + /* + * During hibernation on some GM45 platforms the BIOS may try to access + * the device even though it's already in D3 and hang the machine. So + * leave the device in D0 on those platforms and hope the BIOS will + * power down the device properly. + */ + if (!(hibernation && IS_GM45(dev_priv))) + pci_set_power_state(drm_dev->pdev, PCI_D3hot); return 0; } @@ -677,7 +684,7 @@ int i915_suspend_legacy(struct drm_device *dev, pm_message_t state) if (error) return error; - return i915_drm_suspend_late(dev); + return i915_drm_suspend_late(dev, false); } static int i915_drm_resume(struct drm_device *dev) @@ -965,7 +972,17 @@ static int i915_pm_suspend_late(struct device *dev) if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF) return 0; - return i915_drm_suspend_late(drm_dev); + return i915_drm_suspend_late(drm_dev, false); +} + +static int i915_pm_poweroff_late(struct device *dev) +{ + struct drm_device *drm_dev = dev_to_i915(dev)->dev; + + if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF) + return 0; + + return i915_drm_suspend_late(drm_dev, true); } static int i915_pm_resume_early(struct device *dev) @@ -1535,7 +1552,7 @@ static const struct dev_pm_ops i915_pm_ops = { .thaw_early = i915_pm_resume_early, .thaw = i915_pm_resume, .poweroff = i915_pm_suspend, - .poweroff_late = i915_pm_suspend_late, + .poweroff_late = i915_pm_poweroff_late, .restore_early = i915_pm_resume_early, .restore = i915_pm_resume, -- 2.1.0 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/i915: fix failure to power off after hibernate 2015-02-26 18:50 ` Imre Deak @ 2015-02-26 19:20 ` Bjørn Mork 2015-02-26 20:01 ` [Intel-gfx] " Daniel Vetter 2015-02-26 20:05 ` Ville Syrjälä 2 siblings, 0 replies; 14+ messages in thread From: Bjørn Mork @ 2015-02-26 19:20 UTC (permalink / raw) To: Imre Deak Cc: linux-kernel, Daniel Vetter, Jani Nikula, intel-gfx, dri-devel, stable, Ville Syrjälä Imre Deak <imre.deak@intel.com> writes: > Attached is the proposed fix for this issue. > > --Imre > > From 5c23657bc168db12a1ba100ab65fabd305c89c8a Mon Sep 17 00:00:00 2001 > From: Imre Deak <imre.deak@intel.com> > Date: Thu, 26 Feb 2015 18:38:53 +0200 > Subject: [PATCH] drm/i915: gm45: work around hang during hibernation I can confirm that this patch solves the problem for me. Thanks. Bjørn ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915: fix failure to power off after hibernate 2015-02-26 18:50 ` Imre Deak 2015-02-26 19:20 ` Bjørn Mork @ 2015-02-26 20:01 ` Daniel Vetter 2015-02-27 12:15 ` David Weinehall 2015-02-26 20:05 ` Ville Syrjälä 2 siblings, 1 reply; 14+ messages in thread From: Daniel Vetter @ 2015-02-26 20:01 UTC (permalink / raw) To: Imre Deak Cc: Bjørn Mork, intel-gfx, linux-kernel, dri-devel, stable, Daniel Vetter On Thu, Feb 26, 2015 at 08:50:48PM +0200, Imre Deak wrote: > On to, 2015-02-26 at 10:34 +0100, Bjørn Mork wrote: > > Imre Deak <imre.deak@intel.com> writes: > > > > >> That patch fixes the problem, with only pci_set_power_state commented > > >> out. Do you still want me to try with pci_disable_device() commented > > >> out as well? > > > > > > No, but it would help if you could still try the two attached patch > > > separately, without any of the previous workarounds. Based on the > > > result, we'll follow up with a fix that adds for your specific platform > > > either of the attached workarounds or simply avoids putting the device > > > into D3 (corresponding to the patch you already tried). > > > > None of those patches made any difference. The laptop still hangs at > > power-off. > > > > Not really surprising, is it? Previous testing shows that the hang > > occurs at the pci_set_power_state(drm_dev->pdev, PCI_D3hot) call in the > > poweroff_late hook. It is hard to see how replacing it by an attempt to > > set D3cold, or adding any call after this point, could possibly change > > anything. The system is stil hanging at the pci_set_power_state() call. > > Judging from the blinking LED, I assume that it's not > pci_set_power_state() that hangs the machine, but the hang happens in > BIOS code. > > > The generic pci-driver code will put the i915 device into PCI_D3hot for > > you, won't it? Why do you need to duplicate that in the driver, > > *knowing* that doing so breaks (at least some) systems? > > Letting the pci core put the device into D3 wouldn't get rid of the problem. > It's putting the device into D3 in the first place what causes it. > > > I honestly don't think this "let's try some random code" is the proper > > way to fix this bug (or any other bug for that matter). You need to > > start understanding the code you write, and the first step is by > > actually explaining the changes you make. > > We have a good understanding about the issue: the BIOS on your platform > does something unexpected behind the back of the driver/kernel. In that > sense the patches were not random, for instance the first one is based on > an existing workaround for an issue that resembles quite a lot yours, see > pci_pm_poweroff_noirq(). > > > I also believe that you completely miss the fact that this bug has > > survived a full release cycle before you became aware of it, and the > > implications this has wrt other affected systems/users. I assume you > > understand that my system isn't one-of-a-kind, This means that there are > > other affected users with identical/similar systems. Now, if none of > > those users reported the bug to you (we all know why: Linux kernel > > development is currently limited by the available testing resources, NOT > > by the available developer resources), then how do you know that there > > aren't a number of other systems affected as well? > > > > Let me answer that for you: You don't. > > > > Which is why you must explain the mechanism triggering the bug, proving > > that it is a chipset/system specific issue. Because that's the only way > > you will *know* that you have solved the problem not only for me, but for > > all affected users. > > > > IMHO, the only safe and sane solution at the moment is the revert patch > > I posted. It's a simple fix, reverting back to the *known* working > > state before this regression was introduced. > > > > Then you can start over from there, trying to implement this properly. > > The current way is the proper one that we want for the generic case. The issue > on your platform is the exception, so working around that is a sensible choice. > > Attached is the proposed fix for this issue. > > --Imre > > From 5c23657bc168db12a1ba100ab65fabd305c89c8a Mon Sep 17 00:00:00 2001 > From: Imre Deak <imre.deak@intel.com> > Date: Thu, 26 Feb 2015 18:38:53 +0200 > Subject: [PATCH] drm/i915: gm45: work around hang during hibernation > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > Bjørn reported that his machine hang during hibernation and eventually > bisected the problem to the following commit: > > commit da2bc1b9db3351addd293e5b82757efe1f77ed1d > Author: Imre Deak <imre.deak@intel.com> > Date: Thu Oct 23 19:23:26 2014 +0300 > > drm/i915: add poweroff_late handler > > The problem seems to be that after the kernel puts the device into D3 > the BIOS still tries to access it, or otherwise assumes that it's in D0. > This is clearly bogus, since ACPI mandates that devices are put into D3 > by the OSPM if they are not wake-up sources. In the future we want to > unify more of the driver's runtime and system suspend paths, for example > by skipping all the system suspend/hibernation hooks if the device is > runtime suspended already. Accordingly for all other platforms the goal > is still to properly power down the device during hibernation. > > Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-February/060633.html > Reported-and-bisected-by: Bjørn Mork <bjorn@mork.no> > Signed-off-by: Imre Deak <imre.deak@intel.com> > --- > drivers/gpu/drm/i915/i915_drv.c | 27 ++++++++++++++++++++++----- > 1 file changed, 22 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > index 4badb23..67d212b 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -637,7 +637,7 @@ static int i915_drm_suspend(struct drm_device *dev) > return 0; > } > > -static int i915_drm_suspend_late(struct drm_device *drm_dev) > +static int i915_drm_suspend_late(struct drm_device *drm_dev, bool hibernation) > { > struct drm_i915_private *dev_priv = drm_dev->dev_private; > int ret; > @@ -651,7 +651,14 @@ static int i915_drm_suspend_late(struct drm_device *drm_dev) > } > > pci_disable_device(drm_dev->pdev); > - pci_set_power_state(drm_dev->pdev, PCI_D3hot); > + /* > + * During hibernation on some GM45 platforms the BIOS may try to access > + * the device even though it's already in D3 and hang the machine. So > + * leave the device in D0 on those platforms and hope the BIOS will > + * power down the device properly. > + */ > + if (!(hibernation && IS_GM45(dev_priv))) > + pci_set_power_state(drm_dev->pdev, PCI_D3hot); If we see more of these troublesome machines we might need to apply an even bigger hammer like gen < 5 or so. But as long as there's just 1 report I think this is the right approach. Bjorn, as soon as we have your tested-by that this work we can apply this and shovel it through the backporting machinery. Thanks, Daniel > > return 0; > } > @@ -677,7 +684,7 @@ int i915_suspend_legacy(struct drm_device *dev, pm_message_t state) > if (error) > return error; > > - return i915_drm_suspend_late(dev); > + return i915_drm_suspend_late(dev, false); > } > > static int i915_drm_resume(struct drm_device *dev) > @@ -965,7 +972,17 @@ static int i915_pm_suspend_late(struct device *dev) > if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF) > return 0; > > - return i915_drm_suspend_late(drm_dev); > + return i915_drm_suspend_late(drm_dev, false); > +} > + > +static int i915_pm_poweroff_late(struct device *dev) > +{ > + struct drm_device *drm_dev = dev_to_i915(dev)->dev; > + > + if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF) > + return 0; > + > + return i915_drm_suspend_late(drm_dev, true); > } > > static int i915_pm_resume_early(struct device *dev) > @@ -1535,7 +1552,7 @@ static const struct dev_pm_ops i915_pm_ops = { > .thaw_early = i915_pm_resume_early, > .thaw = i915_pm_resume, > .poweroff = i915_pm_suspend, > - .poweroff_late = i915_pm_suspend_late, > + .poweroff_late = i915_pm_poweroff_late, > .restore_early = i915_pm_resume_early, > .restore = i915_pm_resume, > > -- > 2.1.0 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915: fix failure to power off after hibernate 2015-02-26 20:01 ` [Intel-gfx] " Daniel Vetter @ 2015-02-27 12:15 ` David Weinehall 2015-02-27 18:23 ` Imre Deak 0 siblings, 1 reply; 14+ messages in thread From: David Weinehall @ 2015-02-27 12:15 UTC (permalink / raw) To: intel-gfx Cc: Imre Deak, Bjørn Mork, linux-kernel, dri-devel, stable, Daniel Vetter On Thu, Feb 26, 2015 at 09:01:27PM +0100, Daniel Vetter wrote: > On Thu, Feb 26, 2015 at 08:50:48PM +0200, Imre Deak wrote: [snip] > > > > The problem seems to be that after the kernel puts the device into D3 > > the BIOS still tries to access it, or otherwise assumes that it's in D0. > > This is clearly bogus, since ACPI mandates that devices are put into D3 > > by the OSPM if they are not wake-up sources. In the future we want to > > unify more of the driver's runtime and system suspend paths, for example > > by skipping all the system suspend/hibernation hooks if the device is > > runtime suspended already. Accordingly for all other platforms the goal > > is still to properly power down the device during hibernation. [snip] > > If we see more of these troublesome machines we might need to apply an > even bigger hammer like gen < 5 or so. But as long as there's just 1 > report I think this is the right approach. > > Bjorn, as soon as we have your tested-by that this work we can apply this > and shovel it through the backporting machinery. Isn't there a risk that this will negatively impact machines using gen4 that *don't* have a buggy BIOS? Wouldn't a quirk tied to the laptop or BIOS version be better suited -- or possibly a parameter that can be passed to the driver, which would make it easier to test if others suffering from similar symptoms on other systems suffer from the same issue or not? Just my 2¢. Kind regards, David Weinehall ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915: fix failure to power off after hibernate 2015-02-27 12:15 ` David Weinehall @ 2015-02-27 18:23 ` Imre Deak 2015-03-02 8:55 ` David Weinehall 0 siblings, 1 reply; 14+ messages in thread From: Imre Deak @ 2015-02-27 18:23 UTC (permalink / raw) To: David Weinehall Cc: intel-gfx, Bjørn Mork, linux-kernel, dri-devel, stable, Daniel Vetter On Fri, 2015-02-27 at 14:15 +0200, David Weinehall wrote: > On Thu, Feb 26, 2015 at 09:01:27PM +0100, Daniel Vetter wrote: > > On Thu, Feb 26, 2015 at 08:50:48PM +0200, Imre Deak wrote: > > [snip] > > > > > > The problem seems to be that after the kernel puts the device into D3 > > > the BIOS still tries to access it, or otherwise assumes that it's in D0. > > > This is clearly bogus, since ACPI mandates that devices are put into D3 > > > by the OSPM if they are not wake-up sources. In the future we want to > > > unify more of the driver's runtime and system suspend paths, for example > > > by skipping all the system suspend/hibernation hooks if the device is > > > runtime suspended already. Accordingly for all other platforms the goal > > > is still to properly power down the device during hibernation. > > [snip] > > > > If we see more of these troublesome machines we might need to apply an > > even bigger hammer like gen < 5 or so. But as long as there's just 1 > > report I think this is the right approach. > > > > Bjorn, as soon as we have your tested-by that this work we can apply this > > and shovel it through the backporting machinery. > > Isn't there a risk that this will negatively impact machines using gen4 > that *don't* have a buggy BIOS? Wouldn't a quirk tied to the laptop > or BIOS version be better suited -- or possibly a parameter that can be > passed to the driver, which would make it easier to test if others > suffering from similar symptoms on other systems suffer from the same > issue or not? > > Just my 2¢. We've checked today with Ville a few machines we've found from GEN2 to GEN5+. There was one Thinkpad x61s (GEN4) where I could reproduce the exact same problem and get rid of it using the same workaround. All the others were non-Lenovo (including GEN4) mobile and desktop platforms and none of these had the problem. I also tried it on a Lenovo IVB ultrabook, which also works fine. So the current theory is that it's related to GEN4 Thinkpad BIOSes, and so we need to change the condition to match that at least. --Imre ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915: fix failure to power off after hibernate 2015-02-27 18:23 ` Imre Deak @ 2015-03-02 8:55 ` David Weinehall 0 siblings, 0 replies; 14+ messages in thread From: David Weinehall @ 2015-03-02 8:55 UTC (permalink / raw) To: Imre Deak Cc: intel-gfx, Bjørn Mork, linux-kernel, dri-devel, stable, Daniel Vetter On Fri, Feb 27, 2015 at 08:23:41PM +0200, Imre Deak wrote: > We've checked today with Ville a few machines we've found from GEN2 to > GEN5+. There was one Thinkpad x61s (GEN4) where I could reproduce the > exact same problem and get rid of it using the same workaround. All the > others were non-Lenovo (including GEN4) mobile and desktop platforms and > none of these had the problem. I also tried it on a Lenovo IVB > ultrabook, which also works fine. So the current theory is that it's > related to GEN4 Thinkpad BIOSes, and so we need to change the condition > to match that at least. If you need more ThinkPads to test with I have a few at home :) Kind regards, David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/i915: fix failure to power off after hibernate 2015-02-26 18:50 ` Imre Deak 2015-02-26 19:20 ` Bjørn Mork 2015-02-26 20:01 ` [Intel-gfx] " Daniel Vetter @ 2015-02-26 20:05 ` Ville Syrjälä 2015-02-26 20:29 ` Bjørn Mork 2 siblings, 1 reply; 14+ messages in thread From: Ville Syrjälä @ 2015-02-26 20:05 UTC (permalink / raw) To: Imre Deak Cc: Bjørn Mork, linux-kernel, Daniel Vetter, Jani Nikula, intel-gfx, dri-devel, stable On Thu, Feb 26, 2015 at 08:50:48PM +0200, Imre Deak wrote: > On to, 2015-02-26 at 10:34 +0100, Bjørn Mork wrote: > > Imre Deak <imre.deak@intel.com> writes: > > > > >> That patch fixes the problem, with only pci_set_power_state commented > > >> out. Do you still want me to try with pci_disable_device() commented > > >> out as well? > > > > > > No, but it would help if you could still try the two attached patch > > > separately, without any of the previous workarounds. Based on the > > > result, we'll follow up with a fix that adds for your specific platform > > > either of the attached workarounds or simply avoids putting the device > > > into D3 (corresponding to the patch you already tried). > > > > None of those patches made any difference. The laptop still hangs at > > power-off. > > > > Not really surprising, is it? Previous testing shows that the hang > > occurs at the pci_set_power_state(drm_dev->pdev, PCI_D3hot) call in the > > poweroff_late hook. It is hard to see how replacing it by an attempt to > > set D3cold, or adding any call after this point, could possibly change > > anything. The system is stil hanging at the pci_set_power_state() call. > > Judging from the blinking LED, I assume that it's not > pci_set_power_state() that hangs the machine, but the hang happens in > BIOS code. > > > The generic pci-driver code will put the i915 device into PCI_D3hot for > > you, won't it? Why do you need to duplicate that in the driver, > > *knowing* that doing so breaks (at least some) systems? > > Letting the pci core put the device into D3 wouldn't get rid of the problem. > It's putting the device into D3 in the first place what causes it. > > > I honestly don't think this "let's try some random code" is the proper > > way to fix this bug (or any other bug for that matter). You need to > > start understanding the code you write, and the first step is by > > actually explaining the changes you make. > > We have a good understanding about the issue: the BIOS on your platform > does something unexpected behind the back of the driver/kernel. In that > sense the patches were not random, for instance the first one is based on > an existing workaround for an issue that resembles quite a lot yours, see > pci_pm_poweroff_noirq(). > > > I also believe that you completely miss the fact that this bug has > > survived a full release cycle before you became aware of it, and the > > implications this has wrt other affected systems/users. I assume you > > understand that my system isn't one-of-a-kind, This means that there are > > other affected users with identical/similar systems. Now, if none of > > those users reported the bug to you (we all know why: Linux kernel > > development is currently limited by the available testing resources, NOT > > by the available developer resources), then how do you know that there > > aren't a number of other systems affected as well? > > > > Let me answer that for you: You don't. > > > > Which is why you must explain the mechanism triggering the bug, proving > > that it is a chipset/system specific issue. Because that's the only way > > you will *know* that you have solved the problem not only for me, but for > > all affected users. > > > > IMHO, the only safe and sane solution at the moment is the revert patch > > I posted. It's a simple fix, reverting back to the *known* working > > state before this regression was introduced. > > > > Then you can start over from there, trying to implement this properly. > > The current way is the proper one that we want for the generic case. The issue > on your platform is the exception, so working around that is a sensible choice. > > Attached is the proposed fix for this issue. > > --Imre > > From 5c23657bc168db12a1ba100ab65fabd305c89c8a Mon Sep 17 00:00:00 2001 > From: Imre Deak <imre.deak@intel.com> > Date: Thu, 26 Feb 2015 18:38:53 +0200 > Subject: [PATCH] drm/i915: gm45: work around hang during hibernation > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > Bjørn reported that his machine hang during hibernation and eventually > bisected the problem to the following commit: > > commit da2bc1b9db3351addd293e5b82757efe1f77ed1d > Author: Imre Deak <imre.deak@intel.com> > Date: Thu Oct 23 19:23:26 2014 +0300 > > drm/i915: add poweroff_late handler > > The problem seems to be that after the kernel puts the device into D3 > the BIOS still tries to access it, or otherwise assumes that it's in D0. > This is clearly bogus, since ACPI mandates that devices are put into D3 > by the OSPM if they are not wake-up sources. In the future we want to > unify more of the driver's runtime and system suspend paths, for example > by skipping all the system suspend/hibernation hooks if the device is > runtime suspended already. Accordingly for all other platforms the goal > is still to properly power down the device during hibernation. > > Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-February/060633.html > Reported-and-bisected-by: Bjørn Mork <bjorn@mork.no> > Signed-off-by: Imre Deak <imre.deak@intel.com> > --- > drivers/gpu/drm/i915/i915_drv.c | 27 ++++++++++++++++++++++----- > 1 file changed, 22 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > index 4badb23..67d212b 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -637,7 +637,7 @@ static int i915_drm_suspend(struct drm_device *dev) > return 0; > } > > -static int i915_drm_suspend_late(struct drm_device *drm_dev) > +static int i915_drm_suspend_late(struct drm_device *drm_dev, bool hibernation) > { > struct drm_i915_private *dev_priv = drm_dev->dev_private; > int ret; > @@ -651,7 +651,14 @@ static int i915_drm_suspend_late(struct drm_device *drm_dev) > } > > pci_disable_device(drm_dev->pdev); > - pci_set_power_state(drm_dev->pdev, PCI_D3hot); > + /* > + * During hibernation on some GM45 platforms the BIOS may try to access > + * the device even though it's already in D3 and hang the machine. So > + * leave the device in D0 on those platforms and hope the BIOS will > + * power down the device properly. Please include the model of the known bad machine in this comment, to help future archaeologists. > + */ > + if (!(hibernation && IS_GM45(dev_priv))) > + pci_set_power_state(drm_dev->pdev, PCI_D3hot); > > return 0; > } > @@ -677,7 +684,7 @@ int i915_suspend_legacy(struct drm_device *dev, pm_message_t state) > if (error) > return error; > > - return i915_drm_suspend_late(dev); > + return i915_drm_suspend_late(dev, false); > } > > static int i915_drm_resume(struct drm_device *dev) > @@ -965,7 +972,17 @@ static int i915_pm_suspend_late(struct device *dev) > if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF) > return 0; > > - return i915_drm_suspend_late(drm_dev); > + return i915_drm_suspend_late(drm_dev, false); > +} > + > +static int i915_pm_poweroff_late(struct device *dev) > +{ > + struct drm_device *drm_dev = dev_to_i915(dev)->dev; > + > + if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF) > + return 0; > + > + return i915_drm_suspend_late(drm_dev, true); > } > > static int i915_pm_resume_early(struct device *dev) > @@ -1535,7 +1552,7 @@ static const struct dev_pm_ops i915_pm_ops = { > .thaw_early = i915_pm_resume_early, > .thaw = i915_pm_resume, > .poweroff = i915_pm_suspend, > - .poweroff_late = i915_pm_suspend_late, > + .poweroff_late = i915_pm_poweroff_late, > .restore_early = i915_pm_resume_early, > .restore = i915_pm_resume, > > -- > 2.1.0 > -- Ville Syrjälä Intel OTC ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/i915: fix failure to power off after hibernate 2015-02-26 20:05 ` Ville Syrjälä @ 2015-02-26 20:29 ` Bjørn Mork 0 siblings, 0 replies; 14+ messages in thread From: Bjørn Mork @ 2015-02-26 20:29 UTC (permalink / raw) To: Ville Syrjälä Cc: Imre Deak, linux-kernel, Daniel Vetter, Jani Nikula, intel-gfx, dri-devel, stable Ville Syrjälä <ville.syrjala@linux.intel.com> writes: >> @@ -651,7 +651,14 @@ static int i915_drm_suspend_late(struct drm_device *drm_dev) >> } >> >> pci_disable_device(drm_dev->pdev); >> - pci_set_power_state(drm_dev->pdev, PCI_D3hot); >> + /* >> + * During hibernation on some GM45 platforms the BIOS may try to access >> + * the device even though it's already in D3 and hang the machine. So >> + * leave the device in D0 on those platforms and hope the BIOS will >> + * power down the device properly. > > Please include the model of the known bad machine in this comment, to > help future archaeologists. Here are some details: bjorn@nemi:~$ grep . /sys/class/dmi/id/{bios,product}* 2>/dev/null /sys/class/dmi/id/bios_date:12/19/2011 /sys/class/dmi/id/bios_vendor:LENOVO /sys/class/dmi/id/bios_version:6EET55WW (3.15 ) /sys/class/dmi/id/product_name:2776LEG /sys/class/dmi/id/product_version:ThinkPad X301 Please let me know if you need some other data. Bjørn ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2015-03-02 8:55 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-02-24 14:49 [BISECTED REGRESSION v3.18->v3.19-rc1] drm/i915: failure to poweroff after hibernation Bjørn Mork 2015-02-24 14:58 ` [PATCH] drm/i915: fix failure to power off after hibernate Bjørn Mork 2015-02-24 16:12 ` Imre Deak 2015-02-24 19:00 ` Bjørn Mork 2015-02-25 18:33 ` Imre Deak 2015-02-26 9:34 ` Bjørn Mork 2015-02-26 18:50 ` Imre Deak 2015-02-26 19:20 ` Bjørn Mork 2015-02-26 20:01 ` [Intel-gfx] " Daniel Vetter 2015-02-27 12:15 ` David Weinehall 2015-02-27 18:23 ` Imre Deak 2015-03-02 8:55 ` David Weinehall 2015-02-26 20:05 ` Ville Syrjälä 2015-02-26 20:29 ` Bjørn Mork
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).