LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults
       [not found] <1586138158.v5u7myprlp.none.ref@localhost>
@ 2020-04-06 19:51 ` Alex Xu (Hello71)
  2020-04-06 20:25   ` Thomas Hellström (VMware)
  2020-04-06 21:04   ` Thomas Hellström (VMware)
  0 siblings, 2 replies; 7+ messages in thread
From: Alex Xu (Hello71) @ 2020-04-06 19:51 UTC (permalink / raw)
  To: linux-mm, dri-devel, linux-kernel, thomas_os
  Cc: pv-drivers, linux-graphics-maintainer, Andrew Morton,
	Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse,
	Christian König, Dan Williams, Roland Scheidegger

Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad 
rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to 
start filling dmesg, and then closing programs causes more BUGs and 
hangs, and then everything grinds to a halt (can't start more programs, 
can't even reboot through systemd).

Using master and reverting that branch up to that point fixes the 
problem.

I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4 
board with IOMMU enabled.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults
  2020-04-06 19:51 ` Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults Alex Xu (Hello71)
@ 2020-04-06 20:25   ` Thomas Hellström (VMware)
  2020-04-06 21:04   ` Thomas Hellström (VMware)
  1 sibling, 0 replies; 7+ messages in thread
From: Thomas Hellström (VMware) @ 2020-04-06 20:25 UTC (permalink / raw)
  To: Alex Xu (Hello71), linux-mm, dri-devel, linux-kernel
  Cc: pv-drivers, linux-graphics-maintainer, Andrew Morton,
	Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse,
	Christian König, Dan Williams, Roland Scheidegger

On 4/6/20 9:51 PM, Alex Xu (Hello71) wrote:
> Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad
> rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to
> start filling dmesg, and then closing programs causes more BUGs and
> hangs, and then everything grinds to a halt (can't start more programs,
> can't even reboot through systemd).
>
> Using master and reverting that branch up to that point fixes the
> problem.
>
> I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4
> board with IOMMU enabled.

Hmm. That sounds bad. Could you send a copy of your config?

Meanwhile, I'll prepare a small patch that disables the non-vmwgfx 
huge_fault() until we've figured out what's happening.

/Thomas



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults
  2020-04-06 19:51 ` Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults Alex Xu (Hello71)
  2020-04-06 20:25   ` Thomas Hellström (VMware)
@ 2020-04-06 21:04   ` Thomas Hellström (VMware)
  2020-04-07  0:38     ` Alex Xu (Hello71)
  1 sibling, 1 reply; 7+ messages in thread
From: Thomas Hellström (VMware) @ 2020-04-06 21:04 UTC (permalink / raw)
  To: Alex Xu (Hello71), linux-mm, dri-devel, linux-kernel
  Cc: pv-drivers, linux-graphics-maintainer, Andrew Morton,
	Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse,
	Christian König, Dan Williams, Roland Scheidegger

[-- Attachment #1: Type: text/plain, Size: 631 bytes --]

Hi,

On 4/6/20 9:51 PM, Alex Xu (Hello71) wrote:
> Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad
> rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to
> start filling dmesg, and then closing programs causes more BUGs and
> hangs, and then everything grinds to a halt (can't start more programs,
> can't even reboot through systemd).
>
> Using master and reverting that branch up to that point fixes the
> problem.
>
> I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4
> board with IOMMU enabled.

If you could try the attached patch, that'd be great!

Thanks,

Thomas



[-- Attachment #2: 0001-drm-ttm-Temporarily-disable-the-huge_fault-callback.patch --]
[-- Type: text/x-patch, Size: 2774 bytes --]

From b630b9b4dcc1d01514d97a84cbb7f0cb85333154 Mon Sep 17 00:00:00 2001
From: "Thomas Hellstrom (VMware)" <thomas_os@shipmail.org>
Date: Mon, 6 Apr 2020 22:55:13 +0200
Subject: [PATCH] drm/ttm: Temporarily disable the huge_fault() callback

Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org>
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 63 ---------------------------------
 1 file changed, 63 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 6ee3b96f0d13..0ad30b112982 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -442,66 +442,6 @@ vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
 }
 EXPORT_SYMBOL(ttm_bo_vm_fault);
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-/**
- * ttm_pgprot_is_wrprotecting - Is a page protection value write-protecting?
- * @prot: The page protection value
- *
- * Return: true if @prot is write-protecting. false otherwise.
- */
-static bool ttm_pgprot_is_wrprotecting(pgprot_t prot)
-{
-	/*
-	 * This is meant to say "pgprot_wrprotect(prot) == prot" in a generic
-	 * way. Unfortunately there is no generic pgprot_wrprotect.
-	 */
-	return pte_val(pte_wrprotect(__pte(pgprot_val(prot)))) ==
-		pgprot_val(prot);
-}
-
-static vm_fault_t ttm_bo_vm_huge_fault(struct vm_fault *vmf,
-				       enum page_entry_size pe_size)
-{
-	struct vm_area_struct *vma = vmf->vma;
-	pgprot_t prot;
-	struct ttm_buffer_object *bo = vma->vm_private_data;
-	vm_fault_t ret;
-	pgoff_t fault_page_size = 0;
-	bool write = vmf->flags & FAULT_FLAG_WRITE;
-
-	switch (pe_size) {
-	case PE_SIZE_PMD:
-		fault_page_size = HPAGE_PMD_SIZE >> PAGE_SHIFT;
-		break;
-#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
-	case PE_SIZE_PUD:
-		fault_page_size = HPAGE_PUD_SIZE >> PAGE_SHIFT;
-		break;
-#endif
-	default:
-		WARN_ON_ONCE(1);
-		return VM_FAULT_FALLBACK;
-	}
-
-	/* Fallback on write dirty-tracking or COW */
-	if (write && ttm_pgprot_is_wrprotecting(vma->vm_page_prot))
-		return VM_FAULT_FALLBACK;
-
-	ret = ttm_bo_vm_reserve(bo, vmf);
-	if (ret)
-		return ret;
-
-	prot = vm_get_page_prot(vma->vm_flags);
-	ret = ttm_bo_vm_fault_reserved(vmf, prot, 1, fault_page_size);
-	if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
-		return ret;
-
-	dma_resv_unlock(bo->base.resv);
-
-	return ret;
-}
-#endif
-
 void ttm_bo_vm_open(struct vm_area_struct *vma)
 {
 	struct ttm_buffer_object *bo = vma->vm_private_data;
@@ -604,9 +544,6 @@ static const struct vm_operations_struct ttm_bo_vm_ops = {
 	.open = ttm_bo_vm_open,
 	.close = ttm_bo_vm_close,
 	.access = ttm_bo_vm_access,
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	.huge_fault = ttm_bo_vm_huge_fault,
-#endif
 };
 
 static struct ttm_buffer_object *ttm_bo_vm_lookup(struct ttm_bo_device *bdev,
-- 
2.21.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults
  2020-04-06 21:04   ` Thomas Hellström (VMware)
@ 2020-04-07  0:38     ` Alex Xu (Hello71)
  2020-04-07 11:26       ` Thomas Hellström (VMware)
  0 siblings, 1 reply; 7+ messages in thread
From: Alex Xu (Hello71) @ 2020-04-07  0:38 UTC (permalink / raw)
  To: dri-devel, linux-kernel, linux-mm, Thomas Hellström (VMware)
  Cc: Andrew Morton, Christian König, Dan Williams,
	Jérôme Glisse, Kirill A. Shutemov,
	linux-graphics-maintainer, Michal Hocko, pv-drivers,
	Ralph Campbell, Roland Scheidegger, Matthew Wilcox (Oracle)

Excerpts from Thomas Hellström (VMware)'s message of April 6, 2020 5:04 pm:
> Hi,
> 
> On 4/6/20 9:51 PM, Alex Xu (Hello71) wrote:
>> Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad
>> rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to
>> start filling dmesg, and then closing programs causes more BUGs and
>> hangs, and then everything grinds to a halt (can't start more programs,
>> can't even reboot through systemd).
>>
>> Using master and reverting that branch up to that point fixes the
>> problem.
>>
>> I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4
>> board with IOMMU enabled.
> 
> If you could try the attached patch, that'd be great!
> 
> Thanks,
> 
> Thomas
> 

Yeah, that works too. Kernel config sent off-list.

Regards,
Alex.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults
  2020-04-07  0:38     ` Alex Xu (Hello71)
@ 2020-04-07 11:26       ` Thomas Hellström (VMware)
  2020-04-07 15:36         ` Alex Xu (Hello71)
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Hellström (VMware) @ 2020-04-07 11:26 UTC (permalink / raw)
  To: Alex Xu (Hello71), dri-devel, linux-kernel, linux-mm
  Cc: Andrew Morton, Christian König, Dan Williams,
	Jérôme Glisse, Kirill A. Shutemov,
	linux-graphics-maintainer, Michal Hocko, pv-drivers,
	Ralph Campbell, Roland Scheidegger, Matthew Wilcox (Oracle)

On 4/7/20 2:38 AM, Alex Xu (Hello71) wrote:
> Excerpts from Thomas Hellström (VMware)'s message of April 6, 2020 5:04 pm:
>> Hi,
>>
>> On 4/6/20 9:51 PM, Alex Xu (Hello71) wrote:
>>> Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad
>>> rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to
>>> start filling dmesg, and then closing programs causes more BUGs and
>>> hangs, and then everything grinds to a halt (can't start more programs,
>>> can't even reboot through systemd).
>>>
>>> Using master and reverting that branch up to that point fixes the
>>> problem.
>>>
>>> I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4
>>> board with IOMMU enabled.
>> If you could try the attached patch, that'd be great!
>>
>> Thanks,
>>
>> Thomas
>>
> Yeah, that works too. Kernel config sent off-list.
>
> Regards,
> Alex.

Thanks. Do you want me to add your

Reported-by: and Tested-by: To this patch?

/Thomas


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults
  2020-04-07 11:26       ` Thomas Hellström (VMware)
@ 2020-04-07 15:36         ` Alex Xu (Hello71)
  2020-04-07 19:57           ` Thomas Hellström (VMware)
  0 siblings, 1 reply; 7+ messages in thread
From: Alex Xu (Hello71) @ 2020-04-07 15:36 UTC (permalink / raw)
  To: dri-devel, linux-kernel, linux-mm, Thomas Hellström (VMware)
  Cc: Andrew Morton, Christian König, Dan Williams,
	Jérôme Glisse, Kirill A. Shutemov,
	linux-graphics-maintainer, Michal Hocko, pv-drivers,
	Ralph Campbell, Roland Scheidegger, Matthew Wilcox (Oracle)

Excerpts from Thomas Hellström (VMware)'s message of April 7, 2020 7:26 am:
> On 4/7/20 2:38 AM, Alex Xu (Hello71) wrote:
>> Excerpts from Thomas Hellström (VMware)'s message of April 6, 2020 5:04 pm:
>>> Hi,
>>>
>>> On 4/6/20 9:51 PM, Alex Xu (Hello71) wrote:
>>>> Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad
>>>> rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to
>>>> start filling dmesg, and then closing programs causes more BUGs and
>>>> hangs, and then everything grinds to a halt (can't start more programs,
>>>> can't even reboot through systemd).
>>>>
>>>> Using master and reverting that branch up to that point fixes the
>>>> problem.
>>>>
>>>> I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4
>>>> board with IOMMU enabled.
>>> If you could try the attached patch, that'd be great!
>>>
>>> Thanks,
>>>
>>> Thomas
>>>
>> Yeah, that works too. Kernel config sent off-list.
>>
>> Regards,
>> Alex.
> 
> Thanks. Do you want me to add your
> 
> Reported-by: and Tested-by: To this patch?
> 
> /Thomas
> 
> 

Sure. Shouldn't we fix it properly though?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults
  2020-04-07 15:36         ` Alex Xu (Hello71)
@ 2020-04-07 19:57           ` Thomas Hellström (VMware)
  0 siblings, 0 replies; 7+ messages in thread
From: Thomas Hellström (VMware) @ 2020-04-07 19:57 UTC (permalink / raw)
  To: Alex Xu (Hello71), dri-devel, linux-kernel, linux-mm
  Cc: Andrew Morton, Christian König, Dan Williams,
	Jérôme Glisse, Kirill A. Shutemov,
	linux-graphics-maintainer, Michal Hocko, pv-drivers,
	Ralph Campbell, Roland Scheidegger, Matthew Wilcox (Oracle)

On 4/7/20 5:36 PM, Alex Xu (Hello71) wrote:
> Excerpts from Thomas Hellström (VMware)'s message of April 7, 2020 7:26 am:
>> On 4/7/20 2:38 AM, Alex Xu (Hello71) wrote:
>>> Excerpts from Thomas Hellström (VMware)'s message of April 6, 2020 5:04 pm:
>>>> Hi,
>>>>
>>>> On 4/6/20 9:51 PM, Alex Xu (Hello71) wrote:
>>>>> Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad
>>>>> rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to
>>>>> start filling dmesg, and then closing programs causes more BUGs and
>>>>> hangs, and then everything grinds to a halt (can't start more programs,
>>>>> can't even reboot through systemd).
>>>>>
>>>>> Using master and reverting that branch up to that point fixes the
>>>>> problem.
>>>>>
>>>>> I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4
>>>>> board with IOMMU enabled.
>>>> If you could try the attached patch, that'd be great!
>>>>
>>>> Thanks,
>>>>
>>>> Thomas
>>>>
>>> Yeah, that works too. Kernel config sent off-list.
>>>
>>> Regards,
>>> Alex.
>> Thanks. Do you want me to add your
>>
>> Reported-by: and Tested-by: To this patch?
>>
>> /Thomas
>>
>>
> Sure. Shouldn't we fix it properly though?

It's still enabled for vmwgfx for which it is reasonably well tested and 
where I can't see any such errors.

The code we remove with this patch enables huge page-table entries in 
some circumstances for other drivers, but given the problems you're 
seeing for amdgpu, it's better to enable this on a per-driver basis 
after thorough testing. Since I don't have amdgpu hardware I'm not sure 
what it's doing differently, and can't debug the issue properly.

/Thomas



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-04-07 19:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1586138158.v5u7myprlp.none.ref@localhost>
2020-04-06 19:51 ` Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults Alex Xu (Hello71)
2020-04-06 20:25   ` Thomas Hellström (VMware)
2020-04-06 21:04   ` Thomas Hellström (VMware)
2020-04-07  0:38     ` Alex Xu (Hello71)
2020-04-07 11:26       ` Thomas Hellström (VMware)
2020-04-07 15:36         ` Alex Xu (Hello71)
2020-04-07 19:57           ` Thomas Hellström (VMware)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).