Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Dexuan Cui <decui@microsoft.com>
To: Saeed Mahameed <saeed@kernel.org>, Leon Romanovsky <leon@kernel.org>
Cc: "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"'netdev@vger.kernel.org'" <netdev@vger.kernel.org>,
	"'x86@kernel.org'" <x86@kernel.org>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	"'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>
Subject: RE: [5.14-rc1] mlx5_core receives no interrupts with maxcpus=8
Date: Mon, 19 Jul 2021 20:33:30 +0000	[thread overview]
Message-ID: <BYAPR21MB127077DE03164CA31AE0B33DBFE19@BYAPR21MB1270.namprd21.prod.outlook.com> (raw)
In-Reply-To: <c61af64fd275b3a329bbad699de9db661e3cf082.camel@kernel.org>

> From: Saeed Mahameed <saeed@kernel.org>
> Sent: Monday, July 19, 2021 1:18 PM
> > > ...
> > > It turns out that adding "intremap=off" can work around the issue!
> > >
> > > The root cause is still not clear yet. I don't know why Windows is
> > > good here.
> >
> > The card is stuck in the FW, maybe Saeed knows why. I tried your
> > scenario and it worked for me.
> >
> > Thanks
> 
> I don't think the FW is stuck since we see the cmd completion after
> timeout, this means that the 1st interrupt from the device got lost.
> 
> "wait_func_handle_exec_timeout:1062:(pid 1416): cmd[0]:
> CREATE_EQ(0x301) recovered after timeout"
> 
> the fact that this happens on  5.14 and 5.4 kernels and the issue is
> worked around via bringing the cpus online, or disabling intremap,
> means that there is something wrong with the interrupt remapping
> mechanism, maybe the interrupt is being delivered on an offline cpu ?
> is this a qemu/VM guest or a bare metal host ?

Thanks for the replies! 

This is a bare metal x86-64 host with Intel CPUs. Yes, I believe the
issue is in the IOMMU Interrupt Remapping mechanism rather in the
NIC driver. I just don't understand why bringing the CPUs online and
offline can work around the issue. I'm trying to dump the IOMMU IR
table entries to look for any error. 

Thanks,
Dexuan

  reply	other threads:[~2021-07-19 23:37 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-15  0:38 Dexuan Cui
2021-07-15  1:11 ` Dexuan Cui
2021-07-18  9:12   ` Leon Romanovsky
2021-07-19 20:17     ` Saeed Mahameed
2021-07-19 20:33       ` Dexuan Cui [this message]
2021-07-21 21:16         ` Thomas Gleixner
2021-08-18 21:08           ` Dexuan Cui
2021-08-19 20:41             ` Dexuan Cui
     [not found] <draft-87h7fa1m37.ffs@tglx>
2021-08-28 20:44 ` Thomas Gleixner
2021-08-29 20:11   ` Dexuan Cui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BYAPR21MB127077DE03164CA31AE0B33DBFE19@BYAPR21MB1270.namprd21.prod.outlook.com \
    --to=decui@microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=saeed@kernel.org \
    --cc=x86@kernel.org \
    --subject='RE: [5.14-rc1] mlx5_core receives no interrupts with maxcpus=8' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).