Netdev Archive on
help / color / mirror / Atom feed
From: Thomas Gleixner <>
To: Dexuan Cui <>,
	'Saeed Mahameed' <>,
	'Leon Romanovsky' <>
Cc: "''" <>,
	"''" <>,
	"''" <>,
	Haiyang Zhang <>,
	"''" <>
Subject: RE: [5.14-rc1] mlx5_core receives no interrupts with maxcpus=8
Date: Sat, 28 Aug 2021 22:44:09 +0200	[thread overview]
Message-ID: <87tuj9guzq.ffs@tglx> (raw)
In-Reply-To: <draft-87h7fa1m37.ffs@tglx>


On Sat, Aug 28 2021 at 01:53, Thomas Gleixner wrote:
> On Thu, Aug 19 2021 at 20:41, Dexuan Cui wrote:
>>> Sorry for the late response! I checked the below sys file, and the output is
>>> exactly the same in the good/bad cases -- in both cases, I use maxcpus=8;
>>> the only difference in the good case is that I online and then offline CPU 8~31:
>>> for i in `seq 8 31`;  do echo 1 >  /sys/devices/system/cpu/cpu$i/online; done
>>> for i in `seq 8 31`;  do echo 0 >  /sys/devices/system/cpu/cpu$i/online; done
>>> # cat /sys/kernel/debug/irq/irqs/209

Yes, that looks correct.

>> I tried the kernel parameter "intremap=nosid,no_x2apic_optout,nopost" but
>> it didn't help. Only "intremap=off" can work round the no interrupt issue.
>> When the no interrupt issue happens, irq 209's effective_affinity_list is 5.
>> I modified modify_irte() to print the irte->low, irte->high, and I also printed
>> the irte_index for irq 209, and they were all normal to me, and they were
>> exactly the same in the bad case and the good case -- it looks like, with
>> "intremap=on maxcpus=8", MSI-X on CPU5 can't work for the NIC device
>> (MSI-X on CPU5 works for other devices like a NVMe controller) , and somehow
>> "onlining and then offlining CPU 8~31" can "fix" the issue, which is really weird.

Just for the record: maxcpus=N is a dangerous boot option as it leaves
the non brought up CPUs in a state where they can be hit by MCE
broadcasting without being able to act on it. Which means you're
operating the system out of spec.

According to your debug output the interrupt in question belongs to the
INTEL-IR-3 interrupt domain, which means it hangs of IOMMU3, aka DMAR
unit 3.

To which DMAR/remap unit are the other unaffected devices connected to?



       reply	other threads:[~2021-08-28 20:44 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <draft-87h7fa1m37.ffs@tglx>
2021-08-28 20:44 ` Thomas Gleixner [this message]
2021-08-29 20:11   ` Dexuan Cui
2021-07-15  0:38 Dexuan Cui
2021-07-15  1:11 ` Dexuan Cui
2021-07-18  9:12   ` Leon Romanovsky
2021-07-19 20:17     ` Saeed Mahameed
2021-07-19 20:33       ` Dexuan Cui
2021-07-21 21:16         ` Thomas Gleixner
2021-08-18 21:08           ` Dexuan Cui
2021-08-19 20:41             ` Dexuan Cui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tuj9guzq.ffs@tglx \ \ \ \ \ \ \ \ \ \
    --subject='RE: [5.14-rc1] mlx5_core receives no interrupts with maxcpus=8' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).