LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Dmitry Safonov <dima@arista.com>
To: Lu Baolu <baolu.lu@linux.intel.com>,
	linux-kernel@vger.kernel.org, joro@8bytes.org, "Raj,
	Ashok" <ashok.raj@intel.com>
Cc: 0x7f454c46@gmail.com,
	Alex Williamson <alex.williamson@redhat.com>,
	David Woodhouse <dwmw2@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	iommu@lists.linux-foundation.org
Subject: Re: [PATCHv4 2/2] iommu/vt-d: Limit number of faults to clear in irq handler
Date: Thu, 03 May 2018 01:52:35 +0100	[thread overview]
Message-ID: <1525308755.14025.25.camel@arista.com> (raw)
In-Reply-To: <5AEA4E84.6050609@linux.intel.com>

On Thu, 2018-05-03 at 07:49 +0800, Lu Baolu wrote:
> Hi,
> 
> On 05/02/2018 08:38 PM, Dmitry Safonov wrote:
> > Hi Lu,
> > 
> > On Wed, 2018-05-02 at 14:34 +0800, Lu Baolu wrote:
> > > Hi,
> > > 
> > > On 03/31/2018 08:33 AM, Dmitry Safonov wrote:
> > > > Theoretically, on some machines faults might be generated
> > > > faster
> > > > than
> > > > they're cleared by CPU.
> > > 
> > > Is this a real case?
> > 
> > No. 1/2 is a real case and this one was discussed on v3:
> > lkml.kernel.org/r/<20180215191729.15777-1-dima@arista.com>
> > 
> > It's not possible on my hw as far as I tried, but the discussion
> > result
> > was to fix this theoretical issue too.
> 
> If faults are generated faster than CPU can clear them, the PCIe
> device should be in a very very bad state. How about disabling
> the PCIe device and ask the administrator to replace it? Anyway,
> I don't think that's goal of this patch series. :-)

Uhm, yeah, my point is not about the number of faults, but about
physical ability of iommu to generate faults faster than cpu processes
them. I might be wrong that it's not possible (like low cpu freq?)

But the number of interrupts might be high. It's like you've many
mappings on iommu and PCIe device went off. It could be just a link
flap. I think it makes sense not lockup on such occasions.

> > > >  Let's limit the cleaning-loop by number of hw
> > > > fault registers.
> > > 
> > > Will this cause the fault recording registers full of faults,
> > > hence
> > > new faults will be dropped without logging?
> > 
> > If faults come faster then they're being cleared - some of them
> > will be
> > dropped without logging. Not sure if it's worth to report all
> > faults in
> > such theoretical(!) situation.
> > If amount of reported faults for such situation is not enough and
> > it's
> > worth to keep all the faults, then probably we should introduce a
> > workqueue here (which I did in v1, but it was rejected by the
> > reason
> > that it will introduce some latency in fault reporting).
> > 
> > > And even worse, new faults will not generate interrupts?
> > 
> > They will, we clear page fault overflow outside of the loop, so any
> > new
> > fault will raise interrupt, iiuc.
> > 
> 
> I am afraid that they might not generate interrupts any more.
> 
> Say, the fault registers are full of events that are not cleared,
> then a new fault comes. There is no room for this event and
> hence the hardware might drop it silently.

AFAICS, we're doing fault-clearing in a loop inside irq handler.
That means that while we're clearing if a fault raises, it'll make
an irq level triggered (or on edge) on lapic. So, whenever we return
from the irq handler, irq will raise again.

-- 
Thanks,
             Dmitry

  reply	other threads:[~2018-05-03  0:52 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-31  0:33 [PATCHv4 1/2] iommu/vt-d: Ratelimit each dmar fault printing Dmitry Safonov
2018-03-31  0:33 ` [PATCHv4 2/2] iommu/vt-d: Limit number of faults to clear in irq handler Dmitry Safonov
2018-05-02  6:34   ` Lu Baolu
2018-05-02 12:38     ` Dmitry Safonov
2018-05-02 23:49       ` Lu Baolu
2018-05-03  0:52         ` Dmitry Safonov [this message]
2018-05-03  1:32           ` Lu Baolu
2018-05-03  1:59             ` Dmitry Safonov
2018-05-03  2:16               ` Lu Baolu
2018-05-03  2:32                 ` Lu Baolu
2018-05-03  2:34                 ` Dmitry Safonov
2018-05-03  2:44                   ` Lu Baolu
2018-05-02  2:22 ` [PATCHv4 1/2] iommu/vt-d: Ratelimit each dmar fault printing Dmitry Safonov
2018-05-03 12:40   ` Joerg Roedel
2018-05-03 16:12     ` Dmitry Safonov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1525308755.14025.25.camel@arista.com \
    --to=dima@arista.com \
    --cc=0x7f454c46@gmail.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --subject='Re: [PATCHv4 2/2] iommu/vt-d: Limit number of faults to clear in irq handler' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).