LKML Archive on lore.kernel.org help / color / mirror / Atom feed
From: Felix Kuehling <felix.kuehling@amd.com> To: "Michel Dänzer" <michel@daenzer.net>, "Christian König" <christian.koenig@amd.com>, "Gabriel C" <nix.or.die@gmail.com>, "Philip Yang" <Philip.Yang@amd.com> Cc: Jean-Marc Valin <jmvalin@mozilla.com>, Dave Airlie <airlied@linux.ie>, LKML <linux-kernel@vger.kernel.org>, dri-devel@lists.freedesktop.org, alexander.deucher@amd.com, Andrew Morton <akpm@linux-foundation.org>, Linus Torvalds <torvalds@linux-foundation.org> Subject: Re: AMD graphics performance regression in 4.15 and later Date: Fri, 20 Apr 2018 15:40:20 -0400 [thread overview] Message-ID: <35c599a3-0042-4f00-52e4-9d17164b93b1@amd.com> (raw) In-Reply-To: <2a864040-3888-c30a-2fab-6ff637dddda4@daenzer.net> [+Philip] On 2018-04-20 10:47 AM, Michel Dänzer wrote: > On 2018-04-11 11:37 AM, Christian König wrote: >> Am 11.04.2018 um 06:00 schrieb Gabriel C: >>> 2018-04-09 11:42 GMT+02:00 Christian König >>> <ckoenig.leichtzumerken@gmail.com>: >>>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin: >>>>> Hi Christian, >>>>> >>>>> Thanks for the info. FYI, I've also opened a Firefox bug for that at: >>>>> https://bugzilla.mozilla.org/show_bug.cgi?id=1448778 >>>>> Feel free to comment since you have a better understanding of what's >>>>> going on. >>>>> >>>>> One last question: right now I'm running 4.15.0 with the "offending" >>>>> patch reverted. Is that safe to run or are there possible bad >>>>> interactions with other changes. >>>> That should work without problems. >>>> >>>> But I just had another idea as well, if you want you could still test >>>> the >>>> new code path which will be using in 4.17. >>>> >>> While Firefox may do some strange things is not about only Firefox. >>> >>> With your patches my EPYC box is unusable with 4.15++ kernels. >>> The whole Desktop is acting weird. This one is using >>> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU. >>> >>> Box is 2 * EPYC 7281 with 128 GB ECC RAM >>> >>> Also a 14C Xeon box with a HD7700 is broken same way. >> The hardware is irrelevant for this. We need to know what software stack >> you use on top of it. >> >> E.g. desktop environment/Mesa and DDX version etc... >> >>> Everything breaks in X .. scrolling , moving windows , flickering etc. >>> >>> >>> reverting f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 and >>> 648bc3574716400acc06f99915815f80d9563783 >>> from an 4.15 kernel makes things work again. >>> >>> >>>> Backporting all the detection logic is to invasive, but you could >>>> just go >>>> into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the other >>>> code path. >>>> >>>> Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those. >>>> >>> Well you really can't be serious about these suggestions ? Are you ? >>> >>> Telling peoples to #if 0 random code is not a solution. >> That is for testing and not a permanent solution. >> >>> You broke existsing working userland with your patches and at least >>> please fix that for 4.16. >>> >>> I can help testing code for 4.17/++ if you wish but that is >>> *different* storry. >> Please test Alex's amd-staging-drm-next branch from >> git://people.freedesktop.org/~agd5f/linux. > I think we're still missing something here. > > I'm currently running 4.16.2 + the DRM subsystem changes which are going > into 4.17 (so I have the changes Christian is referring to) with a > Kaveri APU, and I'm seeing similar symptoms as described by Jean-Marc. > Some observations: > > Firefox, Thunderbird, or worst, gnome-shell, can freeze for up to on the > order of a minute, during which the kernel is spending most of one > core's cycles inside alloc_pages (__alloc_pages_nodemask to be more > precise), called from ttm_alloc_new_pages. Philip debugged a similar problem with a KFD memory stress test about two weeks ago, where the kernel was seemingly stuck in an infinite loop trying to allocate huge pages. I'm pasting his analysis for the record: > [...] it uses huge_flags GFP_TRANSHUGE to call alloc_pages(), this > seems a corner case inside __alloc_pages_slowpath(), it never exits > but goes to retry path every time. It can reclaim pages and > did_some_progress (as a result, no_progress_loops is reset to 0 every > loop, never reach MAX_RECLAIM_RETRIES) but cannot finish huge page > allocations under this specific memory pressure. As a workaround to unblock our release branch testing we removed transparent huge page allocation from ttm_get_pages. We're seeing this as far back as 4.13 on our release branch. If we're really talking about the same problem, I don't think it's caused by recent page allocator changes, but rather exposed by recent TTM changes. Regards, Felix > > At least in the case of Firefox, this happens due to Mesa internal BO > allocations for glTex(Sub)Image, so it's not obvious that Firefox is > doing something wrong. > > I never noticed this before this week. Before, I was running 4.15.y + > DRM subsystem changes from 4.16. Maybe something has changed in core > code, trying harder to allocate huge pages. > > > Maybe TTM should only try to use any huge pages that happen to be > available, not spend any (/ "too much"?) additional effort trying to > free up huge pages? > >
next prev parent reply other threads:[~2018-04-20 19:40 UTC|newest] Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-04-06 0:30 AMD graphics performance regression in 4.15 and later Jean-Marc Valin 2018-04-06 8:03 ` Christian König 2018-04-06 8:10 ` Christian König 2018-04-06 15:30 ` Jean-Marc Valin 2018-04-09 11:48 ` Christian König 2018-04-06 16:42 ` Jean-Marc Valin 2018-04-06 17:20 ` Christian König 2018-04-06 22:00 ` Jean-Marc Valin 2018-04-09 9:42 ` Christian König 2018-04-09 15:17 ` Jean-Marc Valin 2018-04-10 6:48 ` Christian König 2018-04-11 4:00 ` Gabriel C 2018-04-11 5:02 ` Gabriel C 2018-06-06 11:28 ` Gabriel C 2018-06-06 11:33 ` Christian König 2018-06-06 12:08 ` Gabriel C 2018-06-06 12:19 ` Christian König 2018-04-11 9:37 ` Christian König 2018-04-11 14:26 ` Gabriel C 2018-04-11 17:21 ` Gabriel C 2018-04-11 18:35 ` Jean-Marc Valin 2018-04-11 22:20 ` Gabriel C 2018-04-12 1:47 ` Gabriel C 2018-04-20 14:47 ` Michel Dänzer 2018-04-20 19:40 ` Felix Kuehling [this message] 2018-04-23 10:23 ` Michel Dänzer 2018-07-13 2:23 ` Jean-Marc Valin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=35c599a3-0042-4f00-52e4-9d17164b93b1@amd.com \ --to=felix.kuehling@amd.com \ --cc=Philip.Yang@amd.com \ --cc=airlied@linux.ie \ --cc=akpm@linux-foundation.org \ --cc=alexander.deucher@amd.com \ --cc=christian.koenig@amd.com \ --cc=dri-devel@lists.freedesktop.org \ --cc=jmvalin@mozilla.com \ --cc=linux-kernel@vger.kernel.org \ --cc=michel@daenzer.net \ --cc=nix.or.die@gmail.com \ --cc=torvalds@linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).