LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* more intel drm issues (was Re: [git pull] drm intel only fixes)
@ 2011-01-20 2:39 Linus Torvalds
2011-01-20 4:55 ` Jeff Chua
0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2011-01-20 2:39 UTC (permalink / raw)
To: Chris Wilson; +Cc: Jesse Barnes, Dave Airlie, linux-kernel, DRI mailing list
Ok, so I have a new issue that I'm currently bisecting but that people
may be able to figure out even befor emy bisect finishes.
On my slow Atom netbook (that I'm planning on using as my traveling
companion for LCA), suspend-to-RAM takes a long time with current git.
It's quite noticeable - it used to be pretty much instant, now it
takes three seconds. And it's all i915 graphics (although I haven't
bisected it down fully, I've bisected it down to the drm merge).
In the good case (like plain 2.6.37), a suspend event will look
something like this:
...
PM: suspend of devices complete after 147.646 msecs
...
but the i915 driver at some point made it take 3s:
...
PM: suspend of devices complete after 3059.656 msecs
...
which is definitely long enough to be worth fixing.
Maybe the person responsible will go "oh, that's obviously due to
xyz", and just fix it. But I'll continue to bisect in case nobody
steps up to admit to wasting time..
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: more intel drm issues (was Re: [git pull] drm intel only fixes)
2011-01-20 2:39 more intel drm issues (was Re: [git pull] drm intel only fixes) Linus Torvalds
@ 2011-01-20 4:55 ` Jeff Chua
2011-01-20 6:22 ` Linus Torvalds
0 siblings, 1 reply; 13+ messages in thread
From: Jeff Chua @ 2011-01-20 4:55 UTC (permalink / raw)
To: Linus Torvalds, Len Brown, Rafael J. Wysocki
Cc: Chris Wilson, Jesse Barnes, Dave Airlie, linux-kernel, DRI mailing list
[-- Attachment #1: Type: text/plain, Size: 1355 bytes --]
On Thu, Jan 20, 2011 at 10:39 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> Ok, so I have a new issue that I'm currently bisecting but that people
> may be able to figure out even befor emy bisect finishes.
>
> On my slow Atom netbook (that I'm planning on using as my traveling
> companion for LCA), suspend-to-RAM takes a long time with current git.
> It's quite noticeable - it used to be pretty much instant, now it
> takes three seconds. And it's all i915 graphics (although I haven't
> bisected it down fully, I've bisected it down to the drm merge).
>
> In the good case (like plain 2.6.37), a suspend event will look
> something like this:...
> PM: suspend of devices complete after 147.646 msecs
> but the i915 driver at some point made it take 3s:
> PM: suspend of devices complete after 3059.656 msecs
> which is definitely long enough to be worth fixing.
>
> Maybe the person responsible will go "oh, that's obviously due to
> xyz", and just fix it. But I'll continue to bisect in case nobody
> steps up to admit to wasting time..
Rafael send out two patches earlier. Could be related. I was facing
issue during resume.
Attached are the two patches. You'll need to apply both.
Len has suggested before to try to booting with "acpi_sleep=nonvs"
which works as well for my case.
Thanks,
Jeff
[-- Attachment #2: patch-resume1 --]
[-- Type: application/octet-stream, Size: 7134 bytes --]
Delivered-To: jeff.chua.linux@gmail.com
Received: by 10.42.164.7 with SMTP id e7cs211926icy;
Wed, 19 Jan 2011 13:29:53 -0800 (PST)
Received: by 10.216.19.139 with SMTP id n11mr1144343wen.78.1295472592165;
Wed, 19 Jan 2011 13:29:52 -0800 (PST)
Return-Path: <rjw@sisk.pl>
Received: from ogre.sisk.pl (ogre.sisk.pl [217.79.144.158])
by mx.google.com with ESMTP id h49si8956988wer.34.2011.01.19.13.29.51;
Wed, 19 Jan 2011 13:29:52 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of rjw@sisk.pl designates 217.79.144.158 as permitted sender) client-ip=217.79.144.158;
Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of rjw@sisk.pl designates 217.79.144.158 as permitted sender) smtp.mail=rjw@sisk.pl
Received: from localhost (localhost.localdomain [127.0.0.1])
by ogre.sisk.pl (Postfix) with ESMTP id 725D51A26AE;
Wed, 19 Jan 2011 22:14:03 +0100 (CET)
Received: from ogre.sisk.pl ([127.0.0.1])
by localhost (ogre.sisk.pl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP
id 13137-05; Wed, 19 Jan 2011 22:13:36 +0100 (CET)
Received: from ferrari.rjw.lan (220-bem-13.acn.waw.pl [82.210.184.220])
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(No client certificate requested)
by ogre.sisk.pl (Postfix) with ESMTP id 9F2D41A2701;
Wed, 19 Jan 2011 22:13:36 +0100 (CET)
From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: Jeff Chua <jeff.chua.linux@gmail.com>
Subject: [1/2] ACPI: Introduce acpi_os_ioremap()
Date: Wed, 19 Jan 2011 22:27:14 +0100
User-Agent: KMail/1.13.5 (Linux/2.6.37+; KDE/4.4.4; x86_64; ; )
Cc: Len Brown <len.brown@intel.com>,
linux-pm@lists.linux-foundation.org,
Jack Steiner <steiner@sgi.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
ACPI Devel Maling List <linux-acpi@vger.kernel.org>
References: <AANLkTimBEnKTuR5aQyXvpZX8grQPw1eCaVgP-k9yBXhv@mail.gmail.com> <AANLkTimf3B=z1DCA2q+yi-ccyAJwmjkB_g8EP=ZTehmn@mail.gmail.com> <201101192219.07569.rjw@sisk.pl>
In-Reply-To: <201101192219.07569.rjw@sisk.pl>
MIME-Version: 1.0
Content-Type: Text/Plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201101192227.14928.rjw@sisk.pl>
X-Virus-Scanned: amavisd-new at ogre.sisk.pl using MkS_Vir for Linux
From: Rafael J. Wysocki <rjw@sisk.pl>
Commit ca9b600 (ACPI / PM: Make suspend_nvs_save() use
acpi_os_map_memory()) attempted to prevent the code in osl.c and
nvs.c from using different ioremap() variants by making the latter
use acpi_os_map_memory() for mapping the NVS pages. However, that
also requires acpi_os_unmap_memory() to be used for unmapping them,
which causes synchronize_rcu() to be executed many times in a row
unnecessarily and introduces substantial delays during resume on
some systems.
Instead of using acpi_os_map_memory() for mapping the NVS pages in
nvs.c introduce acpi_os_ioremap() calling ioremap_cache() and make
the code in both osl.c and nvs.c use it.
Reported-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
drivers/acpi/nvs.c | 7 ++++---
drivers/acpi/osl.c | 12 +++++++-----
include/linux/acpi.h | 3 ---
include/linux/acpi_io.h | 16 ++++++++++++++++
4 files changed, 27 insertions(+), 11 deletions(-)
Index: linux-2.6/drivers/acpi/osl.c
===================================================================
--- linux-2.6.orig/drivers/acpi/osl.c
+++ linux-2.6/drivers/acpi/osl.c
@@ -38,6 +38,7 @@
#include <linux/workqueue.h>
#include <linux/nmi.h>
#include <linux/acpi.h>
+#include <linux/acpi_io.h>
#include <linux/efi.h>
#include <linux/ioport.h>
#include <linux/list.h>
@@ -302,9 +303,10 @@ void __iomem *__init_refok
acpi_os_map_memory(acpi_physical_address phys, acpi_size size)
{
struct acpi_ioremap *map, *tmp_map;
- unsigned long flags, pg_sz;
+ unsigned long flags;
void __iomem *virt;
- phys_addr_t pg_off;
+ acpi_physical_address pg_off;
+ acpi_size pg_sz;
if (phys > ULONG_MAX) {
printk(KERN_ERR PREFIX "Cannot map memory that high\n");
@@ -320,7 +322,7 @@ acpi_os_map_memory(acpi_physical_address
pg_off = round_down(phys, PAGE_SIZE);
pg_sz = round_up(phys + size, PAGE_SIZE) - pg_off;
- virt = ioremap_cache(pg_off, pg_sz);
+ virt = acpi_os_ioremap(pg_off, pg_sz);
if (!virt) {
kfree(map);
return NULL;
@@ -642,7 +644,7 @@ acpi_os_read_memory(acpi_physical_addres
virt_addr = acpi_map_vaddr_lookup(phys_addr, size);
rcu_read_unlock();
if (!virt_addr) {
- virt_addr = ioremap_cache(phys_addr, size);
+ virt_addr = acpi_os_ioremap(phys_addr, size);
unmap = 1;
}
if (!value)
@@ -678,7 +680,7 @@ acpi_os_write_memory(acpi_physical_addre
virt_addr = acpi_map_vaddr_lookup(phys_addr, size);
rcu_read_unlock();
if (!virt_addr) {
- virt_addr = ioremap_cache(phys_addr, size);
+ virt_addr = acpi_os_ioremap(phys_addr, size);
unmap = 1;
}
Index: linux-2.6/drivers/acpi/nvs.c
===================================================================
--- linux-2.6.orig/drivers/acpi/nvs.c
+++ linux-2.6/drivers/acpi/nvs.c
@@ -12,6 +12,7 @@
#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/acpi.h>
+#include <linux/acpi_io.h>
#include <acpi/acpiosxf.h>
/*
@@ -80,7 +81,7 @@ void suspend_nvs_free(void)
free_page((unsigned long)entry->data);
entry->data = NULL;
if (entry->kaddr) {
- acpi_os_unmap_memory(entry->kaddr, entry->size);
+ iounmap(entry->kaddr);
entry->kaddr = NULL;
}
}
@@ -114,8 +115,8 @@ int suspend_nvs_save(void)
list_for_each_entry(entry, &nvs_list, node)
if (entry->data) {
- entry->kaddr = acpi_os_map_memory(entry->phys_start,
- entry->size);
+ entry->kaddr = acpi_os_ioremap(entry->phys_start,
+ entry->size);
if (!entry->kaddr) {
suspend_nvs_free();
return -ENOMEM;
Index: linux-2.6/include/linux/acpi.h
===================================================================
--- linux-2.6.orig/include/linux/acpi.h
+++ linux-2.6/include/linux/acpi.h
@@ -306,9 +306,6 @@ extern acpi_status acpi_pci_osc_control_
u32 *mask, u32 req);
extern void acpi_early_init(void);
-int acpi_os_map_generic_address(struct acpi_generic_address *addr);
-void acpi_os_unmap_generic_address(struct acpi_generic_address *addr);
-
#else /* !CONFIG_ACPI */
#define acpi_disabled 1
Index: linux-2.6/include/linux/acpi_io.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/acpi_io.h
@@ -0,0 +1,16 @@
+#ifndef _ACPI_IO_H_
+#define _ACPI_IO_H_
+
+#include <linux/io.h>
+#include <acpi/acpi.h>
+
+static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys,
+ acpi_size size)
+{
+ return ioremap_cache(phys, size);
+}
+
+int acpi_os_map_generic_address(struct acpi_generic_address *addr);
+void acpi_os_unmap_generic_address(struct acpi_generic_address *addr);
+
+#endif
[-- Attachment #3: patch-resume2 --]
[-- Type: application/octet-stream, Size: 3639 bytes --]
Delivered-To: jeff.chua.linux@gmail.com
Received: by 10.42.164.7 with SMTP id e7cs211925icy;
Wed, 19 Jan 2011 13:29:51 -0800 (PST)
Received: by 10.227.151.204 with SMTP id d12mr1432539wbw.113.1295472590379;
Wed, 19 Jan 2011 13:29:50 -0800 (PST)
Return-Path: <rjw@sisk.pl>
Received: from ogre.sisk.pl (ogre.sisk.pl [217.79.144.158])
by mx.google.com with ESMTP id i10si11530447wej.187.2011.01.19.13.29.49;
Wed, 19 Jan 2011 13:29:50 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of rjw@sisk.pl designates 217.79.144.158 as permitted sender) client-ip=217.79.144.158;
Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of rjw@sisk.pl designates 217.79.144.158 as permitted sender) smtp.mail=rjw@sisk.pl
Received: from localhost (localhost.localdomain [127.0.0.1])
by ogre.sisk.pl (Postfix) with ESMTP id 884BB1A26AC;
Wed, 19 Jan 2011 22:14:02 +0100 (CET)
Received: from ogre.sisk.pl ([127.0.0.1])
by localhost (ogre.sisk.pl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP
id 12814-05; Wed, 19 Jan 2011 22:13:37 +0100 (CET)
Received: from ferrari.rjw.lan (220-bem-13.acn.waw.pl [82.210.184.220])
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(No client certificate requested)
by ogre.sisk.pl (Postfix) with ESMTP id D47561A2738;
Wed, 19 Jan 2011 22:13:36 +0100 (CET)
From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: Jeff Chua <jeff.chua.linux@gmail.com>
Subject: [2/2] ACPI / PM: Call suspend_nvs_free() earlier during resume
Date: Wed, 19 Jan 2011 22:27:55 +0100
User-Agent: KMail/1.13.5 (Linux/2.6.37+; KDE/4.4.4; x86_64; ; )
Cc: Len Brown <len.brown@intel.com>,
linux-pm@lists.linux-foundation.org,
Jack Steiner <steiner@sgi.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
ACPI Devel Maling List <linux-acpi@vger.kernel.org>
References: <AANLkTimBEnKTuR5aQyXvpZX8grQPw1eCaVgP-k9yBXhv@mail.gmail.com> <AANLkTimf3B=z1DCA2q+yi-ccyAJwmjkB_g8EP=ZTehmn@mail.gmail.com> <201101192219.07569.rjw@sisk.pl>
In-Reply-To: <201101192219.07569.rjw@sisk.pl>
MIME-Version: 1.0
Content-Type: Text/Plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201101192227.55298.rjw@sisk.pl>
X-Virus-Scanned: amavisd-new at ogre.sisk.pl using MkS_Vir for Linux
From: Rafael J. Wysocki <rjw@sisk.pl>
It turns out that some device drivers map pages from the ACPI NVS
region during resume using ioremap(), which conflicts with
ioremap_cache() used for mapping those pages by the NVS save/restore
code in nvs.c.
Make the NVS pages mapped by the code in nvs.c be unmapped before
device drivers' resume routines run.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
drivers/acpi/sleep.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-2.6/drivers/acpi/sleep.c
===================================================================
--- linux-2.6.orig/drivers/acpi/sleep.c
+++ linux-2.6/drivers/acpi/sleep.c
@@ -166,6 +166,7 @@ static void acpi_pm_finish(void)
u32 acpi_state = acpi_target_sleep_state;
acpi_ec_unblock_transactions();
+ suspend_nvs_free();
if (acpi_state == ACPI_STATE_S0)
return;
@@ -186,7 +187,6 @@ static void acpi_pm_finish(void)
*/
static void acpi_pm_end(void)
{
- suspend_nvs_free();
/*
* This is necessary in case acpi_pm_finish() is not called during a
* failing transition to a sleep state.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: more intel drm issues (was Re: [git pull] drm intel only fixes)
2011-01-20 4:55 ` Jeff Chua
@ 2011-01-20 6:22 ` Linus Torvalds
2011-01-20 10:17 ` Jeff Chua
2011-01-20 10:25 ` Chris Wilson
0 siblings, 2 replies; 13+ messages in thread
From: Linus Torvalds @ 2011-01-20 6:22 UTC (permalink / raw)
To: Jeff Chua
Cc: Len Brown, Rafael J. Wysocki, Chris Wilson, Jesse Barnes,
Dave Airlie, linux-kernel, DRI mailing list
On Wed, Jan 19, 2011 at 8:55 PM, Jeff Chua <jeff.chua.linux@gmail.com> wrote:
>
> Rafael send out two patches earlier. Could be related. I was facing
> issue during resume.
No, I'm aware of the rcu-synchronize thing, this isn't it. This is
really at the suspend stage, and I had bisected it down to the drm
changes.
In fact, by now I have bisected it down to a single commit. It's
another merge commit, which makes me a bit nervous (I bisected another
issue today, and it turned out to simply not be repeatable).
But this time the merge commit actually has a real conflict that got
fixed up in the merge, and the code around the conflict waits for
three seconds, and three seconds is also exactly how long the delay at
suspend time is. So I get the feeling that this time it's a real
issue, and what happened was that the merge may have been a mismerge.
Chris: as of commit 8d5203ca6253 ("Merge branch 'drm-intel-fixes' into
drm-intel-next") I'm getting that 3-second delay at suspend time. And
the merge diff looks like this:
+ struct drm_device *dev = ring->dev;
+ struct drm_i915_private *dev_priv = dev->dev_private;
unsigned long end;
- drm_i915_private_t *dev_priv = dev->dev_private;
u32 head;
- head = intel_read_status_page(ring, 4);
- if (head) {
- ring->head = head & HEAD_ADDR;
- ring->space = ring->head - (ring->tail + 8);
- if (ring->space < 0)
- ring->space += ring->size;
- if (ring->space >= n)
- return 0;
- }
-
trace_i915_ring_wait_begin (dev);
end = jiffies + 3 * HZ;
do {
and that whole do-loop with a 3-second timeout makes me *very*
suspicious. It used to have (in _one_ of the parent branches) that
code before it to return early if there was space in the ring, now it
doesn't any more - and that merge co-incides with my suspend suddenly
taking 3 seconds.
The same check that is deleted does exist inside the loop too, but
there it has some extra code it in (compare to "actual_head" and so
on), so I wonder if the fast-case was somehow hiding this issue.
But I don't know the code. I just see that whole "PM: suspend of
devices complete after x.xxx msecs" issue, and I can see the machine
taking too long to suspend.
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: more intel drm issues (was Re: [git pull] drm intel only fixes)
2011-01-20 6:22 ` Linus Torvalds
@ 2011-01-20 10:17 ` Jeff Chua
2011-01-20 10:25 ` Chris Wilson
1 sibling, 0 replies; 13+ messages in thread
From: Jeff Chua @ 2011-01-20 10:17 UTC (permalink / raw)
To: Linus Torvalds
Cc: Len Brown, Rafael J. Wysocki, Chris Wilson, Jesse Barnes,
Dave Airlie, linux-kernel, DRI mailing list
On Thu, Jan 20, 2011 at 2:22 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Jan 19, 2011 at 8:55 PM, Jeff Chua <jeff.chua.linux@gmail.com> wrote:
>>
>> Rafael send out two patches earlier. Could be related. I was facing
>> issue during resume.
>
> No, I'm aware of the rcu-synchronize thing, this isn't it. This is
> really at the suspend stage, and I had bisected it down to the drm
> changes.
>
> In fact, by now I have bisected it down to a single commit. It's
> another merge commit, which makes me a bit nervous (I bisected another
> issue today, and it turned out to simply not be repeatable).
>
> But this time the merge commit actually has a real conflict that got
> fixed up in the merge, and the code around the conflict waits for
> three seconds, and three seconds is also exactly how long the delay at
> suspend time is. So I get the feeling that this time it's a real
> issue, and what happened was that the merge may have been a mismerge.
I did see that once during suspend. But as you mentioned, 3 seconds,
and it wasn't repeatable. It was at the first suspend right after
reboot.
Jeff
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: more intel drm issues (was Re: [git pull] drm intel only fixes)
2011-01-20 6:22 ` Linus Torvalds
2011-01-20 10:17 ` Jeff Chua
@ 2011-01-20 10:25 ` Chris Wilson
2011-01-20 16:07 ` Linus Torvalds
1 sibling, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2011-01-20 10:25 UTC (permalink / raw)
To: Linus Torvalds, Jeff Chua
Cc: Len Brown, Rafael J. Wysocki, Jesse Barnes, Dave Airlie,
linux-kernel, DRI mailing list
On Wed, 19 Jan 2011 22:22:48 -0800, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Wed, Jan 19, 2011 at 8:55 PM, Jeff Chua <jeff.chua.linux@gmail.com> wrote:
> >
> > Rafael send out two patches earlier. Could be related. I was facing
> > issue during resume.
>
> No, I'm aware of the rcu-synchronize thing, this isn't it. This is
> really at the suspend stage, and I had bisected it down to the drm
> changes.
>
> In fact, by now I have bisected it down to a single commit. It's
> another merge commit, which makes me a bit nervous (I bisected another
> issue today, and it turned out to simply not be repeatable).
>
> But this time the merge commit actually has a real conflict that got
> fixed up in the merge, and the code around the conflict waits for
> three seconds, and three seconds is also exactly how long the delay at
> suspend time is. So I get the feeling that this time it's a real
> issue, and what happened was that the merge may have been a mismerge.
>
> Chris: as of commit 8d5203ca6253 ("Merge branch 'drm-intel-fixes' into
> drm-intel-next") I'm getting that 3-second delay at suspend time. And
> the merge diff looks like this:
>
> + struct drm_device *dev = ring->dev;
> + struct drm_i915_private *dev_priv = dev->dev_private;
> unsigned long end;
> - drm_i915_private_t *dev_priv = dev->dev_private;
> u32 head;
>
> - head = intel_read_status_page(ring, 4);
> - if (head) {
> - ring->head = head & HEAD_ADDR;
> - ring->space = ring->head - (ring->tail + 8);
> - if (ring->space < 0)
> - ring->space += ring->size;
> - if (ring->space >= n)
> - return 0;
> - }
> -
> trace_i915_ring_wait_begin (dev);
> end = jiffies + 3 * HZ;
> do {
>
> and that whole do-loop with a 3-second timeout makes me *very*
> suspicious. It used to have (in _one_ of the parent branches) that
> code before it to return early if there was space in the ring, now it
> doesn't any more - and that merge co-incides with my suspend suddenly
> taking 3 seconds.
>
> The same check that is deleted does exist inside the loop too, but
> there it has some extra code it in (compare to "actual_head" and so
> on), so I wonder if the fast-case was somehow hiding this issue.
Right, the autoreported HEAD may have been already reset to 0 and so hit
the wraparound bug which caused it to exit early without actually
quiescing the ringbuffer.
Another possibility is that I added a 3s timeout waiting for a request if
IRQs were suspended:
commit b5ba177d8d71f011c23b1cabec99fdaddae65c4d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Dec 14 12:17:15 2010 +0000
drm/i915: Poll for seqno completion if IRQ is disabled
Both of those I think are symptoms of another problem, that perhaps during
suspend we are shutting down parts of the chip before idling?
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: more intel drm issues (was Re: [git pull] drm intel only fixes)
2011-01-20 10:25 ` Chris Wilson
@ 2011-01-20 16:07 ` Linus Torvalds
2011-01-20 17:38 ` Chris Wilson
2011-01-20 17:41 ` [PATCH] drm/i915/ringbuffer: Fix use of stale HEAD position whilst polling for space Chris Wilson
0 siblings, 2 replies; 13+ messages in thread
From: Linus Torvalds @ 2011-01-20 16:07 UTC (permalink / raw)
To: Chris Wilson
Cc: Jeff Chua, Len Brown, Rafael J. Wysocki, Jesse Barnes,
Dave Airlie, linux-kernel, DRI mailing list
On Thu, Jan 20, 2011 at 2:25 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Right, the autoreported HEAD may have been already reset to 0 and so hit
> the wraparound bug which caused it to exit early without actually
> quiescing the ringbuffer.
Yeah, that would explain the issue.
> Another possibility is that I added a 3s timeout waiting for a request if
> IRQs were suspended:
No, if IRQ's are actually suspended here, then that codepath is
totally buggy and would blow up (msleep() doesn't work, and jiffies
wouldn't advance on UP). So that's not it.
> Both of those I think are symptoms of another problem, that perhaps during
> suspend we are shutting down parts of the chip before idling?
That could be, but looking at the code, one thing strikes me: the
_normal_ case (of just waiting for "enough space" in the ring buffer)
doesn't need to use the exact case, but the "wait for ring buffer to
be totally empty" does.
Which means that the use of the "fast-but-inaccurate" 'head' sounds
wrong for the "wait for idle" case.
So can you explain the difference between
intel_read_status_page(ring, 4);
vs
I915_READ_HEAD(ring);
because from looking at the code, I get the notion that
"intel_read_status_page()" may not be exact. But what happens if that
inexact value matches our cached ring->actual_head, so we never even
try to read the exact case? Does it _stay_ inexact for arbitrarily
long times? If so, we might wait for the ring to empty forever (well,
until the timeout - the behavior I see), even though the ring really
_is_ empty. No?
Also, isn't that "head < ring->actual_head" buggy? What about the
overflow case? Not that we care, because afaik, 'actual_head' is not
actually used anywhere, so it should be called 'pointless_head'?
That code looks suspiciously bogus.
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: more intel drm issues (was Re: [git pull] drm intel only fixes)
2011-01-20 16:07 ` Linus Torvalds
@ 2011-01-20 17:38 ` Chris Wilson
2011-01-20 17:51 ` Linus Torvalds
2011-01-20 17:41 ` [PATCH] drm/i915/ringbuffer: Fix use of stale HEAD position whilst polling for space Chris Wilson
1 sibling, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2011-01-20 17:38 UTC (permalink / raw)
To: Linus Torvalds
Cc: Jeff Chua, Len Brown, Rafael J. Wysocki, Jesse Barnes,
Dave Airlie, linux-kernel, DRI mailing list
On Thu, 20 Jan 2011 08:07:02 -0800, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Thu, Jan 20, 2011 at 2:25 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >
> > Right, the autoreported HEAD may have been already reset to 0 and so hit
> > the wraparound bug which caused it to exit early without actually
> > quiescing the ringbuffer.
>
> Yeah, that would explain the issue.
>
> > Another possibility is that I added a 3s timeout waiting for a request if
> > IRQs were suspended:
>
> No, if IRQ's are actually suspended here, then that codepath is
> totally buggy and would blow up (msleep() doesn't work, and jiffies
> wouldn't advance on UP). So that's not it.
>
> > Both of those I think are symptoms of another problem, that perhaps during
> > suspend we are shutting down parts of the chip before idling?
>
> That could be, but looking at the code, one thing strikes me: the
> _normal_ case (of just waiting for "enough space" in the ring buffer)
> doesn't need to use the exact case, but the "wait for ring buffer to
> be totally empty" does.
>
> Which means that the use of the "fast-but-inaccurate" 'head' sounds
> wrong for the "wait for idle" case.
>
> So can you explain the difference between
>
> intel_read_status_page(ring, 4);
>
> vs
>
> I915_READ_HEAD(ring);
For I915_READ_HEAD, we need to wake up the GT power well, perform an
uncached read from the register, and then power down. This takes on the
order of a 100 microseconds (less if the GT is already powered up, etc).
Instead a read from the status page is from cached memory. The caveat here
is that value is only updated by the gfx engine when its HEAD crosses
every 64k boundary. So quite rarely.
> because from looking at the code, I get the notion that
> "intel_read_status_page()" may not be exact. But what happens if that
> inexact value matches our cached ring->actual_head, so we never even
> try to read the exact case? Does it _stay_ inexact for arbitrarily
> long times? If so, we might wait for the ring to empty forever (well,
> until the timeout - the behavior I see), even though the ring really
> _is_ empty. No?
Ah. Your analysis is spot on and this will cause a hang whilst polling if
we enter the loop with the last known head the same as the reported value.
> Also, isn't that "head < ring->actual_head" buggy? What about the
> overflow case? Not that we care, because afaik, 'actual_head' is not
> actually used anywhere, so it should be called 'pointless_head'?
This is the one case that I think is handled correctly, ignoring all the
other bugs.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] drm/i915/ringbuffer: Fix use of stale HEAD position whilst polling for space
2011-01-20 16:07 ` Linus Torvalds
2011-01-20 17:38 ` Chris Wilson
@ 2011-01-20 17:41 ` Chris Wilson
2011-01-21 8:35 ` Jiri Slaby
1 sibling, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2011-01-20 17:41 UTC (permalink / raw)
Cc: Dave Airlie, Jesse Barnes, linux-kernel, dri-devel, Chris Wilson
During suspend, Linus found that his machine would hang for 3 seconds,
and identified that intel_ring_buffer_wait() was the culprit:
"Because from looking at the code, I get the notion that
"intel_read_status_page()" may not be exact. But what happens if that
inexact value matches our cached ring->actual_head, so we never even
try to read the exact case? Does it _stay_ inexact for arbitrarily
long times? If so, we might wait for the ring to empty forever (well,
until the timeout - the behavior I see), even though the ring really
_is_ empty."
As the reported HEAD position is only updated every time it crosses a
64k boundary, whilst draining the ring it is indeed likely to remain one
value. If that value matches the last known HEAD position, we never read
the true value from the register and so trigger a timeout.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
drivers/gpu/drm/i915/intel_ringbuffer.c | 40 ++++++++++++++++++------------
drivers/gpu/drm/i915/intel_ringbuffer.h | 1 -
2 files changed, 24 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 51fbc5e..6218fa9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -34,6 +34,14 @@
#include "i915_trace.h"
#include "intel_drv.h"
+static inline int ring_space(struct intel_ring_buffer *ring)
+{
+ int space = (ring->head & HEAD_ADDR) - (ring->tail + 8);
+ if (space < 0)
+ space += ring->size;
+ return space;
+}
+
static u32 i915_gem_get_seqno(struct drm_device *dev)
{
drm_i915_private_t *dev_priv = dev->dev_private;
@@ -204,11 +212,9 @@ static int init_ring_common(struct intel_ring_buffer *ring)
if (!drm_core_check_feature(ring->dev, DRIVER_MODESET))
i915_kernel_lost_context(ring->dev);
else {
- ring->head = I915_READ_HEAD(ring) & HEAD_ADDR;
+ ring->head = I915_READ_HEAD(ring);
ring->tail = I915_READ_TAIL(ring) & TAIL_ADDR;
- ring->space = ring->head - (ring->tail + 8);
- if (ring->space < 0)
- ring->space += ring->size;
+ ring->space = ring_space(ring);
}
return 0;
@@ -921,7 +927,7 @@ static int intel_wrap_ring_buffer(struct intel_ring_buffer *ring)
}
ring->tail = 0;
- ring->space = ring->head - 8;
+ ring->space = ring_space(ring);
return 0;
}
@@ -933,20 +939,22 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n)
unsigned long end;
u32 head;
+ /* If the reported head position has wrapped or hasn't advanced,
+ * fallback to the slow and accurate path.
+ */
+ head = intel_read_status_page(ring, 4);
+ if (head > ring->head) {
+ ring->head = head;
+ ring->space = ring_space(ring);
+ if (ring->space >= n)
+ return 0;
+ }
+
trace_i915_ring_wait_begin (dev);
end = jiffies + 3 * HZ;
do {
- /* If the reported head position has wrapped or hasn't advanced,
- * fallback to the slow and accurate path.
- */
- head = intel_read_status_page(ring, 4);
- if (head < ring->actual_head)
- head = I915_READ_HEAD(ring);
- ring->actual_head = head;
- ring->head = head & HEAD_ADDR;
- ring->space = ring->head - (ring->tail + 8);
- if (ring->space < 0)
- ring->space += ring->size;
+ ring->head = I915_READ_HEAD(ring);
+ ring->space = ring_space(ring);
if (ring->space >= n) {
trace_i915_ring_wait_end(dev);
return 0;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 61d5220..6d6fde8 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -47,7 +47,6 @@ struct intel_ring_buffer {
struct drm_device *dev;
struct drm_i915_gem_object *obj;
- u32 actual_head;
u32 head;
u32 tail;
int space;
--
1.7.2.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: more intel drm issues (was Re: [git pull] drm intel only fixes)
2011-01-20 17:38 ` Chris Wilson
@ 2011-01-20 17:51 ` Linus Torvalds
2011-01-20 20:44 ` Linus Torvalds
0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2011-01-20 17:51 UTC (permalink / raw)
To: Chris Wilson
Cc: Jeff Chua, Len Brown, Rafael J. Wysocki, Jesse Barnes,
Dave Airlie, linux-kernel, DRI mailing list
On Thu, Jan 20, 2011 at 9:38 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
>> because from looking at the code, I get the notion that
>> "intel_read_status_page()" may not be exact. But what happens if that
>> inexact value matches our cached ring->actual_head, so we never even
>> try to read the exact case? Does it _stay_ inexact for arbitrarily
>> long times? If so, we might wait for the ring to empty forever (well,
>> until the timeout - the behavior I see), even though the ring really
>> _is_ empty. No?
>
> Ah. Your analysis is spot on and this will cause a hang whilst polling if
> we enter the loop with the last known head the same as the reported value.
So how about just doing this in the loop? It will mean that the
_first_ read uses the fast cached one (the common case, hopefully),
but then if we loop, we'll use the slow exact one.
(cut-and-paste, so whitespace isn't good):
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 03e3370..11bbfb5 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -961,6 +961,8 @@ int intel_wait_ring_buffer(struct
intel_ring_buffer *ring, int n)
msleep(1);
if (atomic_read(&dev_priv->mm.wedged))
return -EAGAIN;
+ /* Force a re-read. FIXME: what if read_status_page
says 0 too */
+ ring->actual_head = 0;
} while (!time_after(jiffies, end));
trace_i915_ring_wait_end (dev);
return -EBUSY;
but to get rid of the FIXME you should probably get rid of
"actual_head" entirely, and just use a local variable as a simple
boolean flag instead.
Hmm?
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: more intel drm issues (was Re: [git pull] drm intel only fixes)
2011-01-20 17:51 ` Linus Torvalds
@ 2011-01-20 20:44 ` Linus Torvalds
0 siblings, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2011-01-20 20:44 UTC (permalink / raw)
To: Chris Wilson
Cc: Jeff Chua, Len Brown, Rafael J. Wysocki, Jesse Barnes,
Dave Airlie, linux-kernel, DRI mailing list
[-- Attachment #1: Type: text/plain, Size: 1475 bytes --]
On Thu, Jan 20, 2011 at 9:51 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So how about just doing this in the loop? It will mean that the
> _first_ read uses the fast cached one (the common case, hopefully),
> but then if we loop, we'll use the slow exact one.
>
> (cut-and-paste, so whitespace isn't good):
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 03e3370..11bbfb5 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -961,6 +961,8 @@ int intel_wait_ring_buffer(struct
> intel_ring_buffer *ring, int n)
> msleep(1);
> if (atomic_read(&dev_priv->mm.wedged))
> return -EAGAIN;
> + /* Force a re-read. FIXME: what if read_status_page
> says 0 too */
> + ring->actual_head = 0;
> } while (!time_after(jiffies, end));
> trace_i915_ring_wait_end (dev);
> return -EBUSY;
This makes no difference. And the reason is exactly that we get the
zero case that I had in the comment.
But THIS attached patch actually seems to fix the slow suspend for me.
I removed the accesses to "actual_head", because that whole field
seems to not be used.
So it seems like the intel_read_status_page() thing returns zero
forever when suspending. Maybe you can explain why.
Linus
[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1716 bytes --]
drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++--
drivers/gpu/drm/i915/intel_ringbuffer.h | 1 -
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 03e3370..f6b9baa 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -928,6 +928,7 @@ static int intel_wrap_ring_buffer(struct intel_ring_buffer *ring)
int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n)
{
+ int reread = 0;
struct drm_device *dev = ring->dev;
struct drm_i915_private *dev_priv = dev->dev_private;
unsigned long end;
@@ -940,9 +941,8 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n)
* fallback to the slow and accurate path.
*/
head = intel_read_status_page(ring, 4);
- if (head < ring->actual_head)
+ if (reread)
head = I915_READ_HEAD(ring);
- ring->actual_head = head;
ring->head = head & HEAD_ADDR;
ring->space = ring->head - (ring->tail + 8);
if (ring->space < 0)
@@ -961,6 +961,7 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n)
msleep(1);
if (atomic_read(&dev_priv->mm.wedged))
return -EAGAIN;
+ reread = 1;
} while (!time_after(jiffies, end));
trace_i915_ring_wait_end (dev);
return -EBUSY;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index be9087e..5b0abfa 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -47,7 +47,6 @@ struct intel_ring_buffer {
struct drm_device *dev;
struct drm_i915_gem_object *obj;
- u32 actual_head;
u32 head;
u32 tail;
int space;
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] drm/i915/ringbuffer: Fix use of stale HEAD position whilst polling for space
2011-01-20 17:41 ` [PATCH] drm/i915/ringbuffer: Fix use of stale HEAD position whilst polling for space Chris Wilson
@ 2011-01-21 8:35 ` Jiri Slaby
2011-01-21 10:11 ` [PATCH] drm/i915: Fix use of invalid array size for ring->sync_seqno Chris Wilson
0 siblings, 1 reply; 13+ messages in thread
From: Jiri Slaby @ 2011-01-21 8:35 UTC (permalink / raw)
To: Chris Wilson; +Cc: Dave Airlie, Jesse Barnes, linux-kernel, dri-devel
On 01/20/2011 06:41 PM, Chris Wilson wrote:
> During suspend, Linus found that his machine would hang for 3 seconds,
> and identified that intel_ring_buffer_wait() was the culprit:
FWIW works for me. From 4s I'm back to:
PM: suspend of devices complete after 1087.670 msecs
BTW I did this change as there are loops running out of bounds of this
array added in 1ec14ad3.
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -52,7 +52,7 @@ struct intel_ring_buffer {
u32 irq_seqno; /* last seq seem at irq
time */
u32 waiting_seqno;
- u32 sync_seqno[I915_NUM_RINGS-1];
+ u32 sync_seqno[I915_NUM_RINGS];
atomic_t irq_refcount;
bool __must_check (*irq_get)(struct intel_ring_buffer *ring);
void (*irq_put)(struct intel_ring_buffer *ring);
thanks,
--
js
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] drm/i915: Fix use of invalid array size for ring->sync_seqno
2011-01-21 8:35 ` Jiri Slaby
@ 2011-01-21 10:11 ` Chris Wilson
2011-01-22 22:24 ` Jiri Slaby
0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2011-01-21 10:11 UTC (permalink / raw)
To: Jiri Slaby
Cc: Dave Airlie, Jesse Barnes, linux-kernel, dri-devel, Chris Wilson
There are I915_NUM_RINGS-1 inter-ring synchronisation counters, but we
were clearing I915_NUM_RINGS of them. Oops.
Reported-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
Jiri, does this catch the invalid array access? For 3 rings, there should
only be 2 sync counters...
---
drivers/gpu/drm/i915/i915_gem.c | 2 +-
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3dfc848..812b97b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1857,7 +1857,7 @@ i915_gem_retire_requests_ring(struct drm_device *dev,
seqno = ring->get_seqno(ring);
- for (i = 0; i < I915_NUM_RINGS; i++)
+ for (i = 0; i < ARRAY_SIZE(ring->sync_seqno); i++)
if (seqno >= ring->sync_seqno[i])
ring->sync_seqno[i] = 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index dcfdf41..d2f445e 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1175,7 +1175,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
goto err;
seqno = i915_gem_next_request_seqno(dev, ring);
- for (i = 0; i < I915_NUM_RINGS-1; i++) {
+ for (i = 0; i < ARRAY_SIZE(ring->sync_seqno); i++) {
if (seqno < ring->sync_seqno[i]) {
/* The GPU can not handle its semaphore value wrapping,
* so every billion or so execbuffers, we need to stall
--
1.7.2.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] drm/i915: Fix use of invalid array size for ring->sync_seqno
2011-01-21 10:11 ` [PATCH] drm/i915: Fix use of invalid array size for ring->sync_seqno Chris Wilson
@ 2011-01-22 22:24 ` Jiri Slaby
0 siblings, 0 replies; 13+ messages in thread
From: Jiri Slaby @ 2011-01-22 22:24 UTC (permalink / raw)
To: Chris Wilson; +Cc: Dave Airlie, Jesse Barnes, linux-kernel, dri-devel
On 01/21/2011 11:11 AM, Chris Wilson wrote:
> There are I915_NUM_RINGS-1 inter-ring synchronisation counters, but we
> were clearing I915_NUM_RINGS of them. Oops.
>
> Reported-by: Jiri Slaby <jirislaby@gmail.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>
> Jiri, does this catch the invalid array access? For 3 rings, there should
> only be 2 sync counters...
Yup, it's gone.
thanks,
--
js
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2011-01-22 22:24 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-20 2:39 more intel drm issues (was Re: [git pull] drm intel only fixes) Linus Torvalds
2011-01-20 4:55 ` Jeff Chua
2011-01-20 6:22 ` Linus Torvalds
2011-01-20 10:17 ` Jeff Chua
2011-01-20 10:25 ` Chris Wilson
2011-01-20 16:07 ` Linus Torvalds
2011-01-20 17:38 ` Chris Wilson
2011-01-20 17:51 ` Linus Torvalds
2011-01-20 20:44 ` Linus Torvalds
2011-01-20 17:41 ` [PATCH] drm/i915/ringbuffer: Fix use of stale HEAD position whilst polling for space Chris Wilson
2011-01-21 8:35 ` Jiri Slaby
2011-01-21 10:11 ` [PATCH] drm/i915: Fix use of invalid array size for ring->sync_seqno Chris Wilson
2011-01-22 22:24 ` Jiri Slaby
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).