LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Felix Kuehling <felix.kuehling@amd.com>
To: "Michel Dänzer" <michel@daenzer.net>,
	"Christian König" <christian.koenig@amd.com>,
	"Gabriel C" <nix.or.die@gmail.com>,
	"Philip Yang" <Philip.Yang@amd.com>
Cc: Jean-Marc Valin <jmvalin@mozilla.com>,
	Dave Airlie <airlied@linux.ie>,
	LKML <linux-kernel@vger.kernel.org>,
	dri-devel@lists.freedesktop.org, alexander.deucher@amd.com,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: AMD graphics performance regression in 4.15 and later
Date: Fri, 20 Apr 2018 15:40:20 -0400	[thread overview]
Message-ID: <35c599a3-0042-4f00-52e4-9d17164b93b1@amd.com> (raw)
In-Reply-To: <2a864040-3888-c30a-2fab-6ff637dddda4@daenzer.net>

[+Philip]

On 2018-04-20 10:47 AM, Michel Dänzer wrote:
> On 2018-04-11 11:37 AM, Christian König wrote:
>> Am 11.04.2018 um 06:00 schrieb Gabriel C:
>>> 2018-04-09 11:42 GMT+02:00 Christian König
>>> <ckoenig.leichtzumerken@gmail.com>:
>>>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
>>>>> Hi Christian,
>>>>>
>>>>> Thanks for the info. FYI, I've also opened a Firefox bug for that at:
>>>>> https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
>>>>> Feel free to comment since you have a better understanding of what's
>>>>> going on.
>>>>>
>>>>> One last question: right now I'm running 4.15.0 with the "offending"
>>>>> patch reverted. Is that safe to run or are there possible bad
>>>>> interactions with other changes.
>>>> That should work without problems.
>>>>
>>>> But I just had another idea as well, if you want you could still test
>>>> the
>>>> new code path which will be using in 4.17.
>>>>
>>> While Firefox may do some strange things is not about only Firefox.
>>>
>>> With your patches my EPYC box is unusable with  4.15++ kernels.
>>> The whole Desktop is acting weird.  This one is using
>>> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.
>>>
>>> Box is  2 * EPYC 7281 with 128 GB ECC RAM
>>>
>>> Also a 14C Xeon box with a HD7700 is broken same way.
>> The hardware is irrelevant for this. We need to know what software stack
>> you use on top of it.
>>
>> E.g. desktop environment/Mesa and DDX version etc...
>>
>>> Everything breaks in X .. scrolling , moving windows , flickering etc.
>>>
>>>
>>> reverting f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 and
>>> 648bc3574716400acc06f99915815f80d9563783
>>> from an 4.15 kernel makes things work again.
>>>
>>>
>>>> Backporting all the detection logic is to invasive, but you could
>>>> just go
>>>> into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the other
>>>> code path.
>>>>
>>>> Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.
>>>>
>>> Well you really can't be serious about these suggestions ? Are you ?
>>>
>>> Telling peoples to #if 0 random code is not a solution.
>> That is for testing and not a permanent solution.
>>
>>> You broke existsing working userland with your patches and at least
>>> please fix that for 4.16.
>>>
>>> I can help testing code for 4.17/++ if you wish but that is
>>> *different* storry.
>> Please test Alex's amd-staging-drm-next branch from
>> git://people.freedesktop.org/~agd5f/linux.
> I think we're still missing something here.
>
> I'm currently running 4.16.2 + the DRM subsystem changes which are going
> into 4.17 (so I have the changes Christian is referring to) with a
> Kaveri APU, and I'm seeing similar symptoms as described by Jean-Marc.
> Some observations:
>
> Firefox, Thunderbird, or worst, gnome-shell, can freeze for up to on the
> order of a minute, during which the kernel is spending most of one
> core's cycles inside alloc_pages (__alloc_pages_nodemask to be more
> precise), called from ttm_alloc_new_pages.
Philip debugged a similar problem with a KFD memory stress test about
two weeks ago, where the kernel was seemingly stuck in an infinite loop
trying to allocate huge pages. I'm pasting his analysis for the record:

> [...] it uses huge_flags GFP_TRANSHUGE to call alloc_pages(), this
> seems a corner case inside __alloc_pages_slowpath(), it never exits
> but goes to retry path every time. It can reclaim pages and
> did_some_progress (as a result, no_progress_loops is reset to 0 every
> loop, never reach MAX_RECLAIM_RETRIES) but cannot finish huge page
> allocations under this specific memory pressure.  
As a workaround to unblock our release branch testing we removed
transparent huge page allocation from  ttm_get_pages. We're seeing this
as far back as 4.13 on our release branch.

If we're really talking about the same problem, I don't think it's
caused by recent page allocator changes, but rather exposed by recent
TTM changes.

Regards,
  Felix

>
> At least in the case of Firefox, this happens due to Mesa internal BO
> allocations for glTex(Sub)Image, so it's not obvious that Firefox is
> doing something wrong.
>
> I never noticed this before this week. Before, I was running 4.15.y +
> DRM subsystem changes from 4.16. Maybe something has changed in core
> code, trying harder to allocate huge pages.
>
>
> Maybe TTM should only try to use any huge pages that happen to be
> available, not spend any (/ "too much"?) additional effort trying to
> free up huge pages?
>
>

  reply	other threads:[~2018-04-20 19:40 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-06  0:30 AMD graphics performance regression in 4.15 and later Jean-Marc Valin
2018-04-06  8:03 ` Christian König
2018-04-06  8:10   ` Christian König
2018-04-06 15:30   ` Jean-Marc Valin
2018-04-09 11:48     ` Christian König
2018-04-06 16:42       ` Jean-Marc Valin
2018-04-06 17:20         ` Christian König
2018-04-06 22:00           ` Jean-Marc Valin
2018-04-09  9:42             ` Christian König
2018-04-09 15:17               ` Jean-Marc Valin
2018-04-10  6:48                 ` Christian König
2018-04-11  4:00               ` Gabriel C
2018-04-11  5:02                 ` Gabriel C
2018-06-06 11:28                   ` Gabriel C
2018-06-06 11:33                     ` Christian König
2018-06-06 12:08                       ` Gabriel C
2018-06-06 12:19                         ` Christian König
2018-04-11  9:37                 ` Christian König
2018-04-11 14:26                   ` Gabriel C
2018-04-11 17:21                     ` Gabriel C
2018-04-11 18:35                   ` Jean-Marc Valin
2018-04-11 22:20                     ` Gabriel C
2018-04-12  1:47                       ` Gabriel C
2018-04-20 14:47                   ` Michel Dänzer
2018-04-20 19:40                     ` Felix Kuehling [this message]
2018-04-23 10:23                       ` Michel Dänzer
2018-07-13  2:23       ` Jean-Marc Valin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=35c599a3-0042-4f00-52e4-9d17164b93b1@amd.com \
    --to=felix.kuehling@amd.com \
    --cc=Philip.Yang@amd.com \
    --cc=airlied@linux.ie \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.deucher@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jmvalin@mozilla.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michel@daenzer.net \
    --cc=nix.or.die@gmail.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).