LKML Archive on lore.kernel.org help / color / mirror / Atom feed
From: "Michel Dänzer" <michel@daenzer.net> To: "Felix Kuehling" <felix.kuehling@amd.com>, "Christian König" <christian.koenig@amd.com>, "Gabriel C" <nix.or.die@gmail.com>, "Philip Yang" <Philip.Yang@amd.com> Cc: Jean-Marc Valin <jmvalin@mozilla.com>, Dave Airlie <airlied@linux.ie>, LKML <linux-kernel@vger.kernel.org>, dri-devel@lists.freedesktop.org, alexander.deucher@amd.com, Andrew Morton <akpm@linux-foundation.org>, Linus Torvalds <torvalds@linux-foundation.org> Subject: Re: AMD graphics performance regression in 4.15 and later Date: Mon, 23 Apr 2018 12:23:49 +0200 [thread overview] Message-ID: <45fc9cac-f37e-833f-5f4a-811db206166d@daenzer.net> (raw) In-Reply-To: <35c599a3-0042-4f00-52e4-9d17164b93b1@amd.com> On 2018-04-20 09:40 PM, Felix Kuehling wrote: > On 2018-04-20 10:47 AM, Michel Dänzer wrote: >> On 2018-04-11 11:37 AM, Christian König wrote: >>> Am 11.04.2018 um 06:00 schrieb Gabriel C: >>>> 2018-04-09 11:42 GMT+02:00 Christian König >>>> <ckoenig.leichtzumerken@gmail.com>: >>>>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin: >>>>>> Hi Christian, >>>>>> >>>>>> Thanks for the info. FYI, I've also opened a Firefox bug for that at: >>>>>> https://bugzilla.mozilla.org/show_bug.cgi?id=1448778 >>>>>> Feel free to comment since you have a better understanding of what's >>>>>> going on. >>>>>> >>>>>> One last question: right now I'm running 4.15.0 with the "offending" >>>>>> patch reverted. Is that safe to run or are there possible bad >>>>>> interactions with other changes. >>>>> That should work without problems. >>>>> >>>>> But I just had another idea as well, if you want you could still test >>>>> the >>>>> new code path which will be using in 4.17. >>>>> >>>> While Firefox may do some strange things is not about only Firefox. >>>> >>>> With your patches my EPYC box is unusable with 4.15++ kernels. >>>> The whole Desktop is acting weird. This one is using >>>> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU. >>>> >>>> Box is 2 * EPYC 7281 with 128 GB ECC RAM >>>> >>>> Also a 14C Xeon box with a HD7700 is broken same way. >>> The hardware is irrelevant for this. We need to know what software stack >>> you use on top of it. >>> >>> E.g. desktop environment/Mesa and DDX version etc... >>> >>>> Everything breaks in X .. scrolling , moving windows , flickering etc. >>>> >>>> >>>> reverting f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 and >>>> 648bc3574716400acc06f99915815f80d9563783 >>>> from an 4.15 kernel makes things work again. >>>> >>>> >>>>> Backporting all the detection logic is to invasive, but you could >>>>> just go >>>>> into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the other >>>>> code path. >>>>> >>>>> Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those. >>>>> >>>> Well you really can't be serious about these suggestions ? Are you ? >>>> >>>> Telling peoples to #if 0 random code is not a solution. >>> That is for testing and not a permanent solution. >>> >>>> You broke existsing working userland with your patches and at least >>>> please fix that for 4.16. >>>> >>>> I can help testing code for 4.17/++ if you wish but that is >>>> *different* storry. >>> Please test Alex's amd-staging-drm-next branch from >>> git://people.freedesktop.org/~agd5f/linux. >> I think we're still missing something here. >> >> I'm currently running 4.16.2 + the DRM subsystem changes which are going >> into 4.17 (so I have the changes Christian is referring to) with a >> Kaveri APU, and I'm seeing similar symptoms as described by Jean-Marc. >> Some observations: >> >> Firefox, Thunderbird, or worst, gnome-shell, can freeze for up to on the >> order of a minute, during which the kernel is spending most of one >> core's cycles inside alloc_pages (__alloc_pages_nodemask to be more >> precise), called from ttm_alloc_new_pages. > Philip debugged a similar problem with a KFD memory stress test about > two weeks ago, where the kernel was seemingly stuck in an infinite loop > trying to allocate huge pages. I'm pasting his analysis for the record: > >> [...] it uses huge_flags GFP_TRANSHUGE to call alloc_pages(), this >> seems a corner case inside __alloc_pages_slowpath(), it never exits >> but goes to retry path every time. It can reclaim pages and >> did_some_progress (as a result, no_progress_loops is reset to 0 every >> loop, never reach MAX_RECLAIM_RETRIES) but cannot finish huge page >> allocations under this specific memory pressure. > As a workaround to unblock our release branch testing we removed > transparent huge page allocation from ttm_get_pages. We're seeing this > as far back as 4.13 on our release branch. Thanks for sharing this. In the future, please raise issues like this on the public mailing lists from the beginning. > If we're really talking about the same problem, I don't think it's > caused by recent page allocator changes, but rather exposed by recent > TTM changes. It sounds related, but probably not exactly the same problem. I already had the TTM code using GFP_TRANSHUGE before I ran into the issue. Also, __alloc_pages_slowpath eventually succeeds for me, it can just take up to about a minute. I'm currently testing using (GFP_TRANSHUGE_LIGHT | __GFP_NORETRY) instead of GFP_TRANSHUGE in TTM. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer
next prev parent reply other threads:[~2018-04-23 10:23 UTC|newest] Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-04-06 0:30 AMD graphics performance regression in 4.15 and later Jean-Marc Valin 2018-04-06 8:03 ` Christian König 2018-04-06 8:10 ` Christian König 2018-04-06 15:30 ` Jean-Marc Valin 2018-04-09 11:48 ` Christian König 2018-04-06 16:42 ` Jean-Marc Valin 2018-04-06 17:20 ` Christian König 2018-04-06 22:00 ` Jean-Marc Valin 2018-04-09 9:42 ` Christian König 2018-04-09 15:17 ` Jean-Marc Valin 2018-04-10 6:48 ` Christian König 2018-04-11 4:00 ` Gabriel C 2018-04-11 5:02 ` Gabriel C 2018-06-06 11:28 ` Gabriel C 2018-06-06 11:33 ` Christian König 2018-06-06 12:08 ` Gabriel C 2018-06-06 12:19 ` Christian König 2018-04-11 9:37 ` Christian König 2018-04-11 14:26 ` Gabriel C 2018-04-11 17:21 ` Gabriel C 2018-04-11 18:35 ` Jean-Marc Valin 2018-04-11 22:20 ` Gabriel C 2018-04-12 1:47 ` Gabriel C 2018-04-20 14:47 ` Michel Dänzer 2018-04-20 19:40 ` Felix Kuehling 2018-04-23 10:23 ` Michel Dänzer [this message] 2018-07-13 2:23 ` Jean-Marc Valin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=45fc9cac-f37e-833f-5f4a-811db206166d@daenzer.net \ --to=michel@daenzer.net \ --cc=Philip.Yang@amd.com \ --cc=airlied@linux.ie \ --cc=akpm@linux-foundation.org \ --cc=alexander.deucher@amd.com \ --cc=christian.koenig@amd.com \ --cc=dri-devel@lists.freedesktop.org \ --cc=felix.kuehling@amd.com \ --cc=jmvalin@mozilla.com \ --cc=linux-kernel@vger.kernel.org \ --cc=nix.or.die@gmail.com \ --cc=torvalds@linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).