Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Dexuan Cui <decui@microsoft.com>
To: 'Thomas Gleixner' <tglx@linutronix.de>,
	'Saeed Mahameed' <saeed@kernel.org>,
	'Leon Romanovsky' <leon@kernel.org>
Cc: "'linux-pci@vger.kernel.org'" <linux-pci@vger.kernel.org>,
	"'netdev@vger.kernel.org'" <netdev@vger.kernel.org>,
	"'x86@kernel.org'" <x86@kernel.org>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	"'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>
Subject: RE: [5.14-rc1] mlx5_core receives no interrupts with maxcpus=8
Date: Thu, 19 Aug 2021 20:41:29 +0000	[thread overview]
Message-ID: <BYAPR21MB1270EEBECCA3E9630FA2214FBFC09@BYAPR21MB1270.namprd21.prod.outlook.com> (raw)
In-Reply-To: <DM6PR21MB12752F080EEE916DACA9F8D6BFFF9@DM6PR21MB1275.namprd21.prod.outlook.com>

> From: Dexuan Cui
> Sent: Wednesday, August 18, 2021 2:08 PM
> 
> > From: Thomas Gleixner <tglx@linutronix.de>
> > Sent: Wednesday, July 21, 2021 2:17 PM
> > To: Dexuan Cui <decui@microsoft.com>; Saeed Mahameed
> >
> > On Mon, Jul 19 2021 at 20:33, Dexuan Cui wrote:
> > > This is a bare metal x86-64 host with Intel CPUs. Yes, I believe the
> > > issue is in the IOMMU Interrupt Remapping mechanism rather in the
> > > NIC driver. I just don't understand why bringing the CPUs online and
> > > offline can work around the issue. I'm trying to dump the IOMMU IR
> > > table entries to look for any error.
> >
> > can you please enable GENERIC_IRQ_DEBUGFS and provide the output of
> >
> > cat /sys/kernel/debug/irq/irqs/$THENICIRQS
> >
> > Thanks,
> >
> >         tglx
> 
> Sorry for the late response! I checked the below sys file, and the output is
> exactly the same in the good/bad cases -- in both cases, I use maxcpus=8;
> the only difference in the good case is that I online and then offline CPU 8~31:
> for i in `seq 8 31`;  do echo 1 >  /sys/devices/system/cpu/cpu$i/online; done
> for i in `seq 8 31`;  do echo 0 >  /sys/devices/system/cpu/cpu$i/online; done
> 
> # cat /sys/kernel/debug/irq/irqs/209
> ...

I tried the kernel parameter "intremap=nosid,no_x2apic_optout,nopost" but
it didn't help. Only "intremap=off" can work round the no interrupt issue.

When the no interrupt issue happens, irq 209's effective_affinity_list is 5.
I modified modify_irte() to print the irte->low, irte->high, and I also printed
the irte_index for irq 209, and they were all normal to me, and they were
exactly the same in the bad case and the good case -- it looks like, with
"intremap=on maxcpus=8", MSI-X on CPU5 can't work for the NIC device
(MSI-X on CPU5 works for other devices like a NVMe controller) , and somehow
"onlining and then offlining CPU 8~31" can "fix" the issue, which is really weird.

Thanks,
Dexuan

  reply	other threads:[~2021-08-19 20:41 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-15  0:38 Dexuan Cui
2021-07-15  1:11 ` Dexuan Cui
2021-07-18  9:12   ` Leon Romanovsky
2021-07-19 20:17     ` Saeed Mahameed
2021-07-19 20:33       ` Dexuan Cui
2021-07-21 21:16         ` Thomas Gleixner
2021-08-18 21:08           ` Dexuan Cui
2021-08-19 20:41             ` Dexuan Cui [this message]
     [not found] <draft-87h7fa1m37.ffs@tglx>
2021-08-28 20:44 ` Thomas Gleixner
2021-08-29 20:11   ` Dexuan Cui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BYAPR21MB1270EEBECCA3E9630FA2214FBFC09@BYAPR21MB1270.namprd21.prod.outlook.com \
    --to=decui@microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=saeed@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --subject='RE: [5.14-rc1] mlx5_core receives no interrupts with maxcpus=8' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).