LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Rob Clark <email@example.com>
To: Will Deacon <firstname.lastname@example.org>
Cc: "email@example.com:IOMMU DRIVERS <firstname.lastname@example.org>,
Joerg Roedel <email@example.com>,"
Robin Murphy <firstname.lastname@example.org>,
"moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE"
"email@example.com:IOMMU DRIVERS <firstname.lastname@example.org>,
Joerg Roedel <email@example.com>," <firstname.lastname@example.org>,
Linux Kernel Mailing List <email@example.com>
Subject: Re: [PATCH] iommu: arm-smmu: Set SCTLR.HUPCF bit
Date: Tue, 13 Nov 2018 08:12:35 -0500 [thread overview]
Message-ID: <CAF6AEGuV2zh97iq+TgkRw0bK3VNSxJieD1N2KMW3N28h07Mfirstname.lastname@example.org> (raw)
On Tue, Nov 13, 2018 at 1:32 AM Will Deacon <email@example.com> wrote:
> On Fri, Nov 09, 2018 at 01:01:55PM -0500, Rob Clark wrote:
> > On Mon, Oct 29, 2018 at 3:09 PM Will Deacon <firstname.lastname@example.org> wrote:
> > > On Thu, Sep 27, 2018 at 06:46:07PM -0400, Rob Clark wrote:
> > > > We seem to need to set either this or CFCFG (stall), otherwise gpu
> > > > faults trigger problems with other in-flight transactions from the
> > > > GPU causing CP errors, etc.
> > > >
> > > > In the ARM SMMU spec, the 'Hit under previous context fault' bit is
> > > > described as:
> > > >
> > > > '0' - Stall or terminate subsequent transactions in the presence
> > > > of an outstanding context fault
> > > > '1' - Process all subsequent transactions independently of any
> > > > outstanding context fault.
> > > >
> > > > Since we don't enable CFCFG (stall) the behavior of terminating
> > > > other transactions makes sense. And is probably not what we want
> > > > (and definately not what we want for GPU).
> > > >
> > > > Signed-off-by: Rob Clark <email@example.com>
> > > > ---
> > > > So I hit this issue a long time back on 820 (msm8996) and at the
> > > > time I solved it with a patch that enabled CFCFG. And it resurfaced
> > > > more recently on sdm845. But at the time CFCFG was rejected, iirc
> > > > because of concern that it would cause problems on other non-qcom
> > > > arm smmu implementations. And I think I forgot to send this version
> > > > of the solution.
> > > >
> > > > If enabling HUPCF is anticipated to cause problems on other ARM
> > > > SMMU implementations, I think I can come up with a variant of this
> > > > patch which conditionally enables it for snapdragon.
> > > >
> > > > Either way, I'd really like to get some variant of this fix merged
> > > > (and probably it would be a good idea for stable kernel branches
> > > > too), since current behaviour with the GPU means faults turn into
> > > > a fantastic cascade of fail.
> > >
> > > Can you describe how this fantastic cascade of fail improves with this
> > > patch, please? If you're getting context faults then something has already
> > > gone horribly wrong, so I'm trying to work out how this improves things.
> > >
> > There are plenty of cases where getting iommu faults with a GPU is
> > "normal", or at least not something the kernel or even GL driver can
> > control.
> Such as? All the mainline driver does is print a diagnostic and clear the
> fault, which doesn't seem generally useful.
it is useful to debug the fault ;-)
Although eventually we'll want to be able to do more than that, like
have the fault trigger bringing in pages of a mmap'd file and that
sort of thing.
> > With this patch, you still get the iommu fault, but it doesn't cause
> > the gpu to crash. But without it, other memory accesses in flight
> > while the fault occurs, like the GPU command-processor reading further
> > ahead in the cmdstream to setup next draw, would return zero's,
> > causing the GPU to crash or get into a bad state.
> I get that part, but I don't understand why we're seeing faults in the first
> place and I worry that this patch is just the tip of the iceberg. It's also
> not clear that processing subsequent transactions is always the right thing
> to do in a world where we actually want to report (and handle) synchronous
> faults from devices.
Sure, it is a bug.. but it can be an application bug that is not
something the userspace GL driver or kernel could do anything about.
We shouldn't let this kill the GPU. If the application didn't have
this much control, we wouldn't need an IOMMU in the first place.
With opencl compute, the userspace controlled shader has full blown
pointers to GPU memory.
And even in cases where it is a userspace GL driver bug, having some
robustness to not completely kill the GPU makes debugging much easier.
Something I do a lot when bringing up support for a new generation of
I'm having a hard time understanding your objection to this.
Returning zero's for non-faulting transactions is a *really bad idea*.
 yes, vc4 did without an IOMMU.. but it also had to do some fairly
complex shader analysis in the kernel, and it was restricted to gles2
feature set, so no real flow control, and no memory access beyond
next prev parent reply other threads:[~2018-11-13 13:12 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-27 22:46 Rob Clark
2018-10-29 19:10 ` Will Deacon
2018-11-09 18:01 ` Rob Clark
2018-11-13 6:32 ` Will Deacon
2018-11-13 13:12 ` Rob Clark [this message]
2018-11-26 19:31 ` Will Deacon
2018-11-26 20:38 ` Jordan Crouse
2018-11-26 20:56 ` Rob Clark
2019-05-24 18:38 ` Rob Clark
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--subject='Re: [PATCH] iommu: arm-smmu: Set SCTLR.HUPCF bit' \
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).