LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Michel Dänzer" <michel@daenzer.net>
To: "Felix Kuehling" <felix.kuehling@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Gabriel C" <nix.or.die@gmail.com>,
	"Philip Yang" <Philip.Yang@amd.com>
Cc: Jean-Marc Valin <jmvalin@mozilla.com>,
	Dave Airlie <airlied@linux.ie>,
	LKML <linux-kernel@vger.kernel.org>,
	dri-devel@lists.freedesktop.org, alexander.deucher@amd.com,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: AMD graphics performance regression in 4.15 and later
Date: Mon, 23 Apr 2018 12:23:49 +0200	[thread overview]
Message-ID: <45fc9cac-f37e-833f-5f4a-811db206166d@daenzer.net> (raw)
In-Reply-To: <35c599a3-0042-4f00-52e4-9d17164b93b1@amd.com>

On 2018-04-20 09:40 PM, Felix Kuehling wrote:
> On 2018-04-20 10:47 AM, Michel Dänzer wrote:
>> On 2018-04-11 11:37 AM, Christian König wrote:
>>> Am 11.04.2018 um 06:00 schrieb Gabriel C:
>>>> 2018-04-09 11:42 GMT+02:00 Christian König
>>>> <ckoenig.leichtzumerken@gmail.com>:
>>>>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
>>>>>> Hi Christian,
>>>>>>
>>>>>> Thanks for the info. FYI, I've also opened a Firefox bug for that at:
>>>>>> https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
>>>>>> Feel free to comment since you have a better understanding of what's
>>>>>> going on.
>>>>>>
>>>>>> One last question: right now I'm running 4.15.0 with the "offending"
>>>>>> patch reverted. Is that safe to run or are there possible bad
>>>>>> interactions with other changes.
>>>>> That should work without problems.
>>>>>
>>>>> But I just had another idea as well, if you want you could still test
>>>>> the
>>>>> new code path which will be using in 4.17.
>>>>>
>>>> While Firefox may do some strange things is not about only Firefox.
>>>>
>>>> With your patches my EPYC box is unusable with  4.15++ kernels.
>>>> The whole Desktop is acting weird.  This one is using
>>>> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.
>>>>
>>>> Box is  2 * EPYC 7281 with 128 GB ECC RAM
>>>>
>>>> Also a 14C Xeon box with a HD7700 is broken same way.
>>> The hardware is irrelevant for this. We need to know what software stack
>>> you use on top of it.
>>>
>>> E.g. desktop environment/Mesa and DDX version etc...
>>>
>>>> Everything breaks in X .. scrolling , moving windows , flickering etc.
>>>>
>>>>
>>>> reverting f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 and
>>>> 648bc3574716400acc06f99915815f80d9563783
>>>> from an 4.15 kernel makes things work again.
>>>>
>>>>
>>>>> Backporting all the detection logic is to invasive, but you could
>>>>> just go
>>>>> into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the other
>>>>> code path.
>>>>>
>>>>> Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.
>>>>>
>>>> Well you really can't be serious about these suggestions ? Are you ?
>>>>
>>>> Telling peoples to #if 0 random code is not a solution.
>>> That is for testing and not a permanent solution.
>>>
>>>> You broke existsing working userland with your patches and at least
>>>> please fix that for 4.16.
>>>>
>>>> I can help testing code for 4.17/++ if you wish but that is
>>>> *different* storry.
>>> Please test Alex's amd-staging-drm-next branch from
>>> git://people.freedesktop.org/~agd5f/linux.
>> I think we're still missing something here.
>>
>> I'm currently running 4.16.2 + the DRM subsystem changes which are going
>> into 4.17 (so I have the changes Christian is referring to) with a
>> Kaveri APU, and I'm seeing similar symptoms as described by Jean-Marc.
>> Some observations:
>>
>> Firefox, Thunderbird, or worst, gnome-shell, can freeze for up to on the
>> order of a minute, during which the kernel is spending most of one
>> core's cycles inside alloc_pages (__alloc_pages_nodemask to be more
>> precise), called from ttm_alloc_new_pages.
> Philip debugged a similar problem with a KFD memory stress test about
> two weeks ago, where the kernel was seemingly stuck in an infinite loop
> trying to allocate huge pages. I'm pasting his analysis for the record:
> 
>> [...] it uses huge_flags GFP_TRANSHUGE to call alloc_pages(), this
>> seems a corner case inside __alloc_pages_slowpath(), it never exits
>> but goes to retry path every time. It can reclaim pages and
>> did_some_progress (as a result, no_progress_loops is reset to 0 every
>> loop, never reach MAX_RECLAIM_RETRIES) but cannot finish huge page
>> allocations under this specific memory pressure.  
> As a workaround to unblock our release branch testing we removed
> transparent huge page allocation from  ttm_get_pages. We're seeing this
> as far back as 4.13 on our release branch.

Thanks for sharing this. In the future, please raise issues like this on
the public mailing lists from the beginning.


> If we're really talking about the same problem, I don't think it's
> caused by recent page allocator changes, but rather exposed by recent
> TTM changes.

It sounds related, but probably not exactly the same problem. I already
had the TTM code using GFP_TRANSHUGE before I ran into the issue. Also,
__alloc_pages_slowpath eventually succeeds for me, it can just take up
to about a minute.

I'm currently testing using (GFP_TRANSHUGE_LIGHT | __GFP_NORETRY)
instead of GFP_TRANSHUGE in TTM.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

  reply	other threads:[~2018-04-23 10:23 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-06  0:30 AMD graphics performance regression in 4.15 and later Jean-Marc Valin
2018-04-06  8:03 ` Christian König
2018-04-06  8:10   ` Christian König
2018-04-06 15:30   ` Jean-Marc Valin
2018-04-09 11:48     ` Christian König
2018-04-06 16:42       ` Jean-Marc Valin
2018-04-06 17:20         ` Christian König
2018-04-06 22:00           ` Jean-Marc Valin
2018-04-09  9:42             ` Christian König
2018-04-09 15:17               ` Jean-Marc Valin
2018-04-10  6:48                 ` Christian König
2018-04-11  4:00               ` Gabriel C
2018-04-11  5:02                 ` Gabriel C
2018-06-06 11:28                   ` Gabriel C
2018-06-06 11:33                     ` Christian König
2018-06-06 12:08                       ` Gabriel C
2018-06-06 12:19                         ` Christian König
2018-04-11  9:37                 ` Christian König
2018-04-11 14:26                   ` Gabriel C
2018-04-11 17:21                     ` Gabriel C
2018-04-11 18:35                   ` Jean-Marc Valin
2018-04-11 22:20                     ` Gabriel C
2018-04-12  1:47                       ` Gabriel C
2018-04-20 14:47                   ` Michel Dänzer
2018-04-20 19:40                     ` Felix Kuehling
2018-04-23 10:23                       ` Michel Dänzer [this message]
2018-07-13  2:23       ` Jean-Marc Valin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45fc9cac-f37e-833f-5f4a-811db206166d@daenzer.net \
    --to=michel@daenzer.net \
    --cc=Philip.Yang@amd.com \
    --cc=airlied@linux.ie \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.deucher@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=jmvalin@mozilla.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nix.or.die@gmail.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).