LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Roger Heflin <rogerheflin@gmail.com>
To: "McKay, Luke" <Luke.McKay@aeroflex.com>
Cc: Andrey Utkin <andrey.utkin@corp.bluecherry.net>,
	Andrey Utkin <andrey.krieger.utkin@gmail.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	"kernel-mentors@selenic.com" <kernel-mentors@selenic.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	kernelnewbies <kernelnewbies@kernelnewbies.org>
Subject: Re: Question on MSI support in PCI and PCI-E devices
Date: Wed, 4 Mar 2015 11:18:07 -0600	[thread overview]
Message-ID: <CAAMCDedc31u_M3N9eqoL+L1s9Z4rms8Jkn07q7QU=-EX4a5+Mw@mail.gmail.com> (raw)
In-Reply-To: <41F544F51A709A4AA2914581E1219BDCBE5016@SEC-MBX-01.aeroflex.corp>

We verified the exact same device worked with the previous cpu in the
same mb/bios combination same os/kernel combination, only identified
change for us was a ivy bridge vs a sandy bridge in the same
mb/bios/boardfirmware.

And in this case only one device driver/pci board was using the given
interrupt.     Hardware vendor for the given pci board debugged a
firmware dump to determine what state the firmware was in and it was
waiting for in intx that never came.     Switching to msi has
resulting in things working reliably.

On Wed, Mar 4, 2015 at 11:04 AM, McKay, Luke <Luke.McKay@aeroflex.com> wrote:
> Legacy INTx is shared amongst multiple devices.  Since it is a level sensitive simulation of the interrupt line, it only takes one device (or driver) to forget to clear the interrupt, and then it stuck and won't work for any of the devices using it.
>
> If you're working with one particular device that seems to be causing these sorts of problems then you can verify misbehaving hardware with a PCIe analyzer.  With the analyzer you can verify that when the driver informs the device that it has processed the interrupt that the device sends the deassertion message for the INTx line.
>
> Or if that isn't available, simply verifying that interrupt being cleared by the driver on the end device is taken correctly and then verifying the chain of propagation that clears the interrupt status.  It can be verified through any switch that is in the path, to the root port where the legacy PCI interrupt controller that the interrupt is cleared, to the top level interrupt controller.
>
> Regards,
> Luke
>
> --
> Luke McKay
> Senior Engineer
> Cobham AvComm
> T : +1 (316) 529 5585
>
> Please consider the environment before printing this email
>
>
>
> -----Original Message-----
> From: Roger Heflin [mailto:rogerheflin@gmail.com]
> Sent: Wednesday, March 04, 2015 10:31 AM
> To: McKay, Luke
> Cc: Andrey Utkin; Andrey Utkin; Stephen Hemminger; kernel-mentors@selenic.com; linux-kernel@vger.kernel.org; kernelnewbies
> Subject: Re: Question on MSI support in PCI and PCI-E devices
>
> I know from some data I have seen that between the Intel Sandy Bridge and Intel Ivy Bridge the same motherboards stopped delivering INTx reliably (int lost under load around 1x every 30 days, driver and
> firmware has no method to recover from failure)   We had to transition
> to using MSI on some PCI cards that had this issue. Our issue was duplicated on a large number of different physical machines so if it was a hardware error is was a lot of different physical machines that had the defect.
>
> On Wed, Mar 4, 2015 at 10:03 AM, McKay, Luke <Luke.McKay@aeroflex.com> wrote:
>> I don't personally know of any PCI drivers that use polling instead of interrupts, since that would really mean the hardware is broke.
>>
>> Basically all you need to do is create a timer, and have it's callback set to your driver routine that can check the device status registers to determine if there is work to be done.  The status register(s) would be the same indicators that should have generated an interrupt.
>>
>> Regards,
>> Luke
>>
>>
>> --
>> Luke McKay
>> Senior Engineer
>> Cobham AvComm
>> T : +1 (316) 529 5585
>>
>> Please consider the environment before printing this email
>>
>>
>>
>> -----Original Message-----
>> From: Andrey Utkin [mailto:andrey.utkin@corp.bluecherry.net]
>> Sent: Tuesday, March 03, 2015 8:29 AM
>> To: McKay, Luke
>> Cc: Andrey Utkin; Stephen Hemminger; kernel-mentors@selenic.com;
>> linux-kernel@vger.kernel.org; kernelnewbies
>> Subject: Re: Question on MSI support in PCI and PCI-E devices
>>
>> On Mon, Mar 2, 2015 at 4:02 PM, McKay, Luke <Luke.McKay@aeroflex.com> wrote:
>>> It doesn't appear that your device supports MSI.  If it did lspci -v should list the MSI capability and whether or not it is enabled.
>>>
>>> i.e. Something like...
>>> Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>>
>>> Without a listing that shows the capability is present, there is nothing to enable.
>>>
>>> Have you tried polling instead of using interrupts?  Definitely not ideal, but it might help you to determine whether hardware is dropping/missing an interrupt or whether the hardware is being completely hung up.
>>>
>>> Do you know if this missing interrupt is occurring in other systems as well?  How about whether it happens with different boards in the same system?  Answers to these questions would help to determine whether you might have a defective board, or some sort of incompatibility with the system.
>>
>> We have just three setups reproducing this. We have no boards for replacement experiments, unfortunately.
>> Polling instead of using interrupts sounds interesting. Is there an example of such usage in any other PCI device driver?
>>
>> --
>> Bluecherry developer.
>>
>>
>> Aeroflex is now a Cobham company
>
>
> Aeroflex is now a Cobham company
>

  reply	other threads:[~2015-03-04 17:18 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <f236217608b24a5e976628fe31d41a03@BRMWP-EXMB11.corp.brocade.com>
2015-02-12 14:48 ` Stephen Hemminger
2015-02-12 15:11   ` Andrey Utkin
2015-03-02 14:02     ` McKay, Luke
2015-03-03 14:29       ` Andrey Utkin
2015-03-04 16:03         ` McKay, Luke
2015-03-04 16:30           ` Roger Heflin
2015-03-04 17:04             ` McKay, Luke
2015-03-04 17:18               ` Roger Heflin [this message]
2015-02-11 18:19 Andrey Utkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAAMCDedc31u_M3N9eqoL+L1s9Z4rms8Jkn07q7QU=-EX4a5+Mw@mail.gmail.com' \
    --to=rogerheflin@gmail.com \
    --cc=Luke.McKay@aeroflex.com \
    --cc=andrey.krieger.utkin@gmail.com \
    --cc=andrey.utkin@corp.bluecherry.net \
    --cc=kernel-mentors@selenic.com \
    --cc=kernelnewbies@kernelnewbies.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stephen@networkplumber.org \
    --subject='Re: Question on MSI support in PCI and PCI-E devices' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).