LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v2 0/2] PCI/MSI: Clarify the IRQ sysfs ABI for PCI devices
@ 2021-08-20 22:37 Barry Song
  2021-08-20 22:37 ` [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X Barry Song
  2021-08-20 22:37 ` [PATCH v2 2/2] Documentation: ABI: sysfs-bus-pci: Add description for IRQ entry Barry Song
  0 siblings, 2 replies; 19+ messages in thread
From: Barry Song @ 2021-08-20 22:37 UTC (permalink / raw)
  To: 21cnbao, bhelgaas, corbet
  Cc: Jonathan.Cameron, bilbao, gregkh, leon, linux-kernel, linux-pci,
	linuxarm, luzmaximilian, mchehab+huawei, schnelle, song.bao.hua

From: Barry Song <song.bao.hua@hisilicon.com>

/sys/bus/pci/devices/.../irq has been there for many years but it has never
been documented. This patch is trying to document it. Plus, irq ABI is very
confusing at this moment especially for MSI and MSI-x cases. MSI sets irq
to the first number in the vector, but MSI-X does nothing for this though
it saves default_irq in msix_setup_entries(). Weird the saved default_irq
for MSI-X is never used in pci_msix_shutdown(), which is quite different
with pci_msi_shutdown(). Thus, this patch also moves to show the first IRQ
number which is from the first msi_entry for MSI-X. Hopefully, this can
make irq ABI more clear and more consistent.

-v2:
  - split into two patches according to Bjorn's comments;
  - Add Greg's Acked-by, thanks for reviewing!
-v1:
  https://lore.kernel.org/lkml/20210813122650.25764-1-21cnbao@gmail.com/#t

Barry Song (2):
  PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X
  Documentation: ABI: sysfs-bus-pci: Add description for IRQ entry

 Documentation/ABI/testing/sysfs-bus-pci | 8 ++++++++
 drivers/pci/msi.c                       | 6 ++++++
 2 files changed, 14 insertions(+)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X
  2021-08-20 22:37 [PATCH v2 0/2] PCI/MSI: Clarify the IRQ sysfs ABI for PCI devices Barry Song
@ 2021-08-20 22:37 ` Barry Song
  2021-08-20 23:33   ` Bjorn Helgaas
  2021-08-20 22:37 ` [PATCH v2 2/2] Documentation: ABI: sysfs-bus-pci: Add description for IRQ entry Barry Song
  1 sibling, 1 reply; 19+ messages in thread
From: Barry Song @ 2021-08-20 22:37 UTC (permalink / raw)
  To: 21cnbao, bhelgaas, corbet
  Cc: Jonathan.Cameron, bilbao, gregkh, leon, linux-kernel, linux-pci,
	linuxarm, luzmaximilian, mchehab+huawei, schnelle, song.bao.hua

From: Barry Song <song.bao.hua@hisilicon.com>

/sys/bus/pci/devices/.../irq sysfs ABI is very confusing at this
moment especially for MSI-X cases. While MSI sets IRQ to the first
number in the vector, MSI-X does nothing for this though it saves
default_irq in msix_setup_entries(). Weird the saved default_irq
for MSI-X is never used in pci_msix_shutdown(), which is quite
different with pci_msi_shutdown(). Thus, this patch moves to show
the first IRQ number which is from the first msi_entry for MSI-X.
Hopefully, this can make IRQ ABI more clear and more consistent.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
---
 drivers/pci/msi.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 9232255..6bbf81b 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -771,6 +771,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 	int ret;
 	u16 control;
 	void __iomem *base;
+	struct msi_desc *desc;
 
 	/* Ensure MSI-X is disabled while it is set up */
 	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
@@ -814,6 +815,10 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
 
 	pcibios_free_irq(dev);
+
+	desc = first_pci_msi_entry(dev);
+	dev->irq = desc->irq;
+
 	return 0;
 
 out_avail:
@@ -1024,6 +1029,7 @@ static void pci_msix_shutdown(struct pci_dev *dev)
 	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
 	pci_intx_for_msi(dev, 1);
 	dev->msix_enabled = 0;
+	dev->irq = entry->msi_attrib.default_irq;
 	pcibios_alloc_irq(dev);
 }
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 2/2] Documentation: ABI: sysfs-bus-pci: Add description for IRQ entry
  2021-08-20 22:37 [PATCH v2 0/2] PCI/MSI: Clarify the IRQ sysfs ABI for PCI devices Barry Song
  2021-08-20 22:37 ` [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X Barry Song
@ 2021-08-20 22:37 ` Barry Song
  1 sibling, 0 replies; 19+ messages in thread
From: Barry Song @ 2021-08-20 22:37 UTC (permalink / raw)
  To: 21cnbao, bhelgaas, corbet
  Cc: Jonathan.Cameron, bilbao, gregkh, leon, linux-kernel, linux-pci,
	linuxarm, luzmaximilian, mchehab+huawei, schnelle, song.bao.hua

From: Barry Song <song.bao.hua@hisilicon.com>

/sys/bus/pci/devices/.../irq has been there for many years but it
has never been documented. This patch is trying to document it.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
---
 Documentation/ABI/testing/sysfs-bus-pci | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index 793cbb7..8d42385 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -96,6 +96,14 @@ Description:
 		This attribute indicates the mode that the irq vector named by
 		the file is in (msi vs. msix)
 
+What:		/sys/bus/pci/devices/.../irq
+Date:		August 2021
+Contact:	Barry Song <song.bao.hua@hisilicon.com>
+Description:
+		Historically this attribute represent the IRQ line which runs
+		from the PCI device to the Interrupt controller. With MSI and
+		MSI-X, this attribute is the first IRQ number of IRQ vectors.
+
 What:		/sys/bus/pci/devices/.../remove
 Date:		January 2009
 Contact:	Linux PCI developers <linux-pci@vger.kernel.org>
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X
  2021-08-20 22:37 ` [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X Barry Song
@ 2021-08-20 23:33   ` Bjorn Helgaas
  2021-08-21 10:42     ` Marc Zyngier
  0 siblings, 1 reply; 19+ messages in thread
From: Bjorn Helgaas @ 2021-08-20 23:33 UTC (permalink / raw)
  To: Barry Song
  Cc: bhelgaas, corbet, Jonathan.Cameron, bilbao, gregkh, leon,
	linux-kernel, linux-pci, linuxarm, luzmaximilian, mchehab+huawei,
	schnelle, song.bao.hua, Thomas Gleixner, Marc Zyngier

[+cc Thomas, Marc]

On Sat, Aug 21, 2021 at 10:37:43AM +1200, Barry Song wrote:
> From: Barry Song <song.bao.hua@hisilicon.com>
> 
> /sys/bus/pci/devices/.../irq sysfs ABI is very confusing at this
> moment especially for MSI-X cases. 

AFAICT this patch *only* affects MSI-X.  So are you saying the sysfs
ABI is fine for MSI but confusing for MSI-X?

> While MSI sets IRQ to the first
> number in the vector, MSI-X does nothing for this though it saves
> default_irq in msix_setup_entries(). Weird the saved default_irq
> for MSI-X is never used in pci_msix_shutdown(), which is quite
> different with pci_msi_shutdown(). Thus, this patch moves to show
> the first IRQ number which is from the first msi_entry for MSI-X.
> Hopefully, this can make IRQ ABI more clear and more consistent.
> 
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> ---
>  drivers/pci/msi.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 9232255..6bbf81b 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -771,6 +771,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
>  	int ret;
>  	u16 control;
>  	void __iomem *base;
> +	struct msi_desc *desc;
>  
>  	/* Ensure MSI-X is disabled while it is set up */
>  	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
> @@ -814,6 +815,10 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
>  	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
>  
>  	pcibios_free_irq(dev);
> +
> +	desc = first_pci_msi_entry(dev);
> +	dev->irq = desc->irq;

This change is not primarily about sysfs.  This is about changing
"dev->irq" when MSI-X is enabled, and it's only incidental that sysfs
reflects that.

So we need to know the effect of changing dev->irq.  Drivers may use
the value of dev->irq, and I'm *guessing* this change shouldn't break
them since we already do this for MSI, but I'd like some more expert
opinion than mine :)

For MSI we have:

  msi_capability_init
    msi_setup_entry
      entry = alloc_msi_entry(nvec)
      entry->msi_attrib.default_irq = dev->irq;     /* Save IOAPIC IRQ */
    dev->irq = entry->irq;

  pci_msi_shutdown
    /* Restore dev->irq to its default pin-assertion IRQ */
    dev->irq = desc->msi_attrib.default_irq;

and for MSI-X we have:

  msix_capability_init
    msix_setup_entries
      for (i = 0; i < nvec; i++)
        entry = alloc_msi_entry(1)
	entry->msi_attrib.default_irq = dev->irq;

  pci_msix_shutdown
    for_each_pci_msi_entry(entry, dev)
      __pci_msix_desc_mask_irq
+   dev->irq = entry->msi_attrib.default_irq;   # added by this patch


Things that seem strange to me:

  - The msi_setup_entry() comment "Save IOAPIC IRQ" seems needlessly
    specific; maybe it should be "INTx IRQ".

  - The pci_msi_shutdown() comment "Restore ... pin-assertion IRQ"
    should match the msi_setup_entry() one, e.g., maybe it should also
    be "INTx IRQ".  There are no INTx or IOAPIC pins in PCIe.

  - The only use of .default_irq is to save and restore dev->irq, so
    it looks like a per-device thing, not a per-vector thing.

    In msi_setup_entry() there's only one msi_entry, so there's only
    one saved .default_irq.

    In msix_setup_entries(), we get nvecs msi_entry structs, and we
    get a saved .default_irq in each one?

  - In pci_msix_shutdown() we restore dev->irq from the .default_irq
    of whatever "entry" contains after the for_each_pci_msi_entry()
    loop.  Is "entry" defined there?  I don't know what the cursor
    contains after a list_for_each_entry() loop exits.

>  	return 0;
>  
>  out_avail:
> @@ -1024,6 +1029,7 @@ static void pci_msix_shutdown(struct pci_dev *dev)
>  	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
>  	pci_intx_for_msi(dev, 1);
>  	dev->msix_enabled = 0;
> +	dev->irq = entry->msi_attrib.default_irq;
>  	pcibios_alloc_irq(dev);
>  }
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X
  2021-08-20 23:33   ` Bjorn Helgaas
@ 2021-08-21 10:42     ` Marc Zyngier
  2021-08-21 22:14       ` Barry Song
  2021-08-24 20:51       ` Barry Song
  0 siblings, 2 replies; 19+ messages in thread
From: Marc Zyngier @ 2021-08-21 10:42 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Barry Song, bhelgaas, corbet, Jonathan.Cameron, bilbao, gregkh,
	leon, linux-kernel, linux-pci, linuxarm, luzmaximilian,
	mchehab+huawei, schnelle, song.bao.hua, Thomas Gleixner

Hi Bjorn,

On Sat, 21 Aug 2021 00:33:28 +0100,
Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> [+cc Thomas, Marc]
> 
> On Sat, Aug 21, 2021 at 10:37:43AM +1200, Barry Song wrote:
> > From: Barry Song <song.bao.hua@hisilicon.com>
> > 
> > /sys/bus/pci/devices/.../irq sysfs ABI is very confusing at this
> > moment especially for MSI-X cases. 
> 
> AFAICT this patch *only* affects MSI-X.  So are you saying the sysfs
> ABI is fine for MSI but confusing for MSI-X?
> 
> > While MSI sets IRQ to the first
> > number in the vector, MSI-X does nothing for this though it saves
> > default_irq in msix_setup_entries(). Weird the saved default_irq
> > for MSI-X is never used in pci_msix_shutdown(), which is quite
> > different with pci_msi_shutdown(). Thus, this patch moves to show
> > the first IRQ number which is from the first msi_entry for MSI-X.
> > Hopefully, this can make IRQ ABI more clear and more consistent.
> > 
> > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> > ---
> >  drivers/pci/msi.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > index 9232255..6bbf81b 100644
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -771,6 +771,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> >  	int ret;
> >  	u16 control;
> >  	void __iomem *base;
> > +	struct msi_desc *desc;
> >  
> >  	/* Ensure MSI-X is disabled while it is set up */
> >  	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
> > @@ -814,6 +815,10 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> >  	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
> >  
> >  	pcibios_free_irq(dev);
> > +
> > +	desc = first_pci_msi_entry(dev);
> > +	dev->irq = desc->irq;
> 
> This change is not primarily about sysfs.  This is about changing
> "dev->irq" when MSI-X is enabled, and it's only incidental that sysfs
> reflects that.
> 
> So we need to know the effect of changing dev->irq.  Drivers may use
> the value of dev->irq, and I'm *guessing* this change shouldn't break
> them since we already do this for MSI, but I'd like some more expert
> opinion than mine :)
> 
> For MSI we have:
> 
>   msi_capability_init
>     msi_setup_entry
>       entry = alloc_msi_entry(nvec)
>       entry->msi_attrib.default_irq = dev->irq;     /* Save IOAPIC IRQ */
>     dev->irq = entry->irq;
> 
>   pci_msi_shutdown
>     /* Restore dev->irq to its default pin-assertion IRQ */
>     dev->irq = desc->msi_attrib.default_irq;
> 
> and for MSI-X we have:
> 
>   msix_capability_init
>     msix_setup_entries
>       for (i = 0; i < nvec; i++)
>         entry = alloc_msi_entry(1)
> 	entry->msi_attrib.default_irq = dev->irq;
> 
>   pci_msix_shutdown
>     for_each_pci_msi_entry(entry, dev)
>       __pci_msix_desc_mask_irq
> +   dev->irq = entry->msi_attrib.default_irq;   # added by this patch
> 
> 
> Things that seem strange to me:
> 
>   - The msi_setup_entry() comment "Save IOAPIC IRQ" seems needlessly
>     specific; maybe it should be "INTx IRQ".
> 
>   - The pci_msi_shutdown() comment "Restore ... pin-assertion IRQ"
>     should match the msi_setup_entry() one, e.g., maybe it should also
>     be "INTx IRQ".  There are no INTx or IOAPIC pins in PCIe.
> 
>   - The only use of .default_irq is to save and restore dev->irq, so
>     it looks like a per-device thing, not a per-vector thing.
> 
>     In msi_setup_entry() there's only one msi_entry, so there's only
>     one saved .default_irq.
> 
>     In msix_setup_entries(), we get nvecs msi_entry structs, and we
>     get a saved .default_irq in each one?

That's a key point.

Old-school PCI/MSI is represented by a single interrupt, and you
*could* somehow make it relatively easy for drivers that only
understand INTx to migrate to MSI if you replaced whatever is held in
dev->irq (which should only represent the INTx mapping) with the MSI
interrupt number. Which I guess is what the MSI code is doing.

This is the 21st century, and nobody should ever rely on such horror,
but I'm sure we do have such drivers in the tree. Boo.

However, this *cannot* hold true for Multi-MSI, nor MSI-X, because
there is a plurality of interrupts. Even worse, for MSI-X, there is
zero guarantee that the allocated interrupts will be in a contiguous
space.

Given that, what is dev->irq good for? "Absolutely Nothing! (say it
again!)".

MSI-X is not something you can "accidentally" use. You have to
actively embrace it. In all honesty, this patch tries to move in the
wrong direction. If anything, we should kill this hack altogether and
fix the (handful of?) drivers that rely on it. That'd actually be a
good way to find whether they are still worth keeping in the tree. And
if it breaks too many of them, then at least we'll know where we
stand.

I'd be tempted to leave the below patch simmer in -next for a few
weeks and see if how many people shout:

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index e5e75331b415..2be9a01cbe72 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -591,7 +591,6 @@ msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd)
 	entry->msi_attrib.is_virtual    = 0;
 	entry->msi_attrib.entry_nr	= 0;
 	entry->msi_attrib.maskbit	= !!(control & PCI_MSI_FLAGS_MASKBIT);
-	entry->msi_attrib.default_irq	= dev->irq;	/* Save IOAPIC IRQ */
 	entry->msi_attrib.multi_cap	= (control & PCI_MSI_FLAGS_QMASK) >> 1;
 	entry->msi_attrib.multiple	= ilog2(__roundup_pow_of_two(nvec));
 
@@ -682,7 +681,6 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
 	dev->msi_enabled = 1;
 
 	pcibios_free_irq(dev);
-	dev->irq = entry->irq;
 	return 0;
 }
 
@@ -742,7 +740,6 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base,
 		entry->msi_attrib.is_virtual =
 			entry->msi_attrib.entry_nr >= vec_count;
 
-		entry->msi_attrib.default_irq	= dev->irq;
 		entry->mask_base		= base;
 
 		addr = pci_msix_desc_addr(entry);
@@ -964,8 +961,6 @@ static void pci_msi_shutdown(struct pci_dev *dev)
 	mask = msi_mask(desc->msi_attrib.multi_cap);
 	msi_mask_irq(desc, mask, 0);
 
-	/* Restore dev->irq to its default pin-assertion IRQ */
-	dev->irq = desc->msi_attrib.default_irq;
 	pcibios_alloc_irq(dev);
 }
 
diff --git a/include/linux/msi.h b/include/linux/msi.h
index e8bdcb83172b..a631664c1c38 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -114,7 +114,6 @@ struct ti_sci_inta_msi_desc {
  * @maskbit:	[PCI MSI/X] Mask-Pending bit supported?
  * @is_64:	[PCI MSI/X] Address size: 0=32bit 1=64bit
  * @entry_nr:	[PCI MSI/X] Entry which is described by this descriptor
- * @default_irq:[PCI MSI/X] The default pre-assigned non-MSI irq
  * @mask_pos:	[PCI MSI]   Mask register position
  * @mask_base:	[PCI MSI-X] Mask register base address
  * @platform:	[platform]  Platform device specific msi descriptor data
@@ -148,7 +147,6 @@ struct msi_desc {
 				u8	is_64		: 1;
 				u8	is_virtual	: 1;
 				u16	entry_nr;
-				unsigned default_irq;
 			} msi_attrib;
 			union {
 				u8	mask_pos;

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X
  2021-08-21 10:42     ` Marc Zyngier
@ 2021-08-21 22:14       ` Barry Song
  2021-08-21 22:41         ` Barry Song
  2021-08-23 10:30         ` Marc Zyngier
  2021-08-24 20:51       ` Barry Song
  1 sibling, 2 replies; 19+ messages in thread
From: Barry Song @ 2021-08-21 22:14 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Bjorn Helgaas, Bjorn Helgaas, Jonathan Corbet, Jonathan.Cameron,
	bilbao, Greg Kroah-Hartman, leon, LKML, linux-pci, Linuxarm,
	luzmaximilian, mchehab+huawei, schnelle, Barry Song,
	Thomas Gleixner

On Sat, Aug 21, 2021 at 10:42 PM Marc Zyngier <maz@kernel.org> wrote:
>
> Hi Bjorn,
>
> On Sat, 21 Aug 2021 00:33:28 +0100,
> Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > [+cc Thomas, Marc]
> >
> > On Sat, Aug 21, 2021 at 10:37:43AM +1200, Barry Song wrote:
> > > From: Barry Song <song.bao.hua@hisilicon.com>
> > >
> > > /sys/bus/pci/devices/.../irq sysfs ABI is very confusing at this
> > > moment especially for MSI-X cases.
> >
> > AFAICT this patch *only* affects MSI-X.  So are you saying the sysfs
> > ABI is fine for MSI but confusing for MSI-X?
> >
> > > While MSI sets IRQ to the first
> > > number in the vector, MSI-X does nothing for this though it saves
> > > default_irq in msix_setup_entries(). Weird the saved default_irq
> > > for MSI-X is never used in pci_msix_shutdown(), which is quite
> > > different with pci_msi_shutdown(). Thus, this patch moves to show
> > > the first IRQ number which is from the first msi_entry for MSI-X.
> > > Hopefully, this can make IRQ ABI more clear and more consistent.
> > >
> > > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> > > ---
> > >  drivers/pci/msi.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > > index 9232255..6bbf81b 100644
> > > --- a/drivers/pci/msi.c
> > > +++ b/drivers/pci/msi.c
> > > @@ -771,6 +771,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> > >     int ret;
> > >     u16 control;
> > >     void __iomem *base;
> > > +   struct msi_desc *desc;
> > >
> > >     /* Ensure MSI-X is disabled while it is set up */
> > >     pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
> > > @@ -814,6 +815,10 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> > >     pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
> > >
> > >     pcibios_free_irq(dev);
> > > +
> > > +   desc = first_pci_msi_entry(dev);
> > > +   dev->irq = desc->irq;
> >
> > This change is not primarily about sysfs.  This is about changing
> > "dev->irq" when MSI-X is enabled, and it's only incidental that sysfs
> > reflects that.
> >
> > So we need to know the effect of changing dev->irq.  Drivers may use
> > the value of dev->irq, and I'm *guessing* this change shouldn't break
> > them since we already do this for MSI, but I'd like some more expert
> > opinion than mine :)
> >
> > For MSI we have:
> >
> >   msi_capability_init
> >     msi_setup_entry
> >       entry = alloc_msi_entry(nvec)
> >       entry->msi_attrib.default_irq = dev->irq;     /* Save IOAPIC IRQ */
> >     dev->irq = entry->irq;
> >
> >   pci_msi_shutdown
> >     /* Restore dev->irq to its default pin-assertion IRQ */
> >     dev->irq = desc->msi_attrib.default_irq;
> >
> > and for MSI-X we have:
> >
> >   msix_capability_init
> >     msix_setup_entries
> >       for (i = 0; i < nvec; i++)
> >         entry = alloc_msi_entry(1)
> >       entry->msi_attrib.default_irq = dev->irq;
> >
> >   pci_msix_shutdown
> >     for_each_pci_msi_entry(entry, dev)
> >       __pci_msix_desc_mask_irq
> > +   dev->irq = entry->msi_attrib.default_irq;   # added by this patch
> >
> >
> > Things that seem strange to me:
> >
> >   - The msi_setup_entry() comment "Save IOAPIC IRQ" seems needlessly
> >     specific; maybe it should be "INTx IRQ".
> >
> >   - The pci_msi_shutdown() comment "Restore ... pin-assertion IRQ"
> >     should match the msi_setup_entry() one, e.g., maybe it should also
> >     be "INTx IRQ".  There are no INTx or IOAPIC pins in PCIe.
> >
> >   - The only use of .default_irq is to save and restore dev->irq, so
> >     it looks like a per-device thing, not a per-vector thing.
> >
> >     In msi_setup_entry() there's only one msi_entry, so there's only
> >     one saved .default_irq.
> >
> >     In msix_setup_entries(), we get nvecs msi_entry structs, and we
> >     get a saved .default_irq in each one?
>
> That's a key point.
>
> Old-school PCI/MSI is represented by a single interrupt, and you
> *could* somehow make it relatively easy for drivers that only
> understand INTx to migrate to MSI if you replaced whatever is held in
> dev->irq (which should only represent the INTx mapping) with the MSI
> interrupt number. Which I guess is what the MSI code is doing.
>
> This is the 21st century, and nobody should ever rely on such horror,
> but I'm sure we do have such drivers in the tree. Boo.
>
> However, this *cannot* hold true for Multi-MSI, nor MSI-X, because
> there is a plurality of interrupts. Even worse, for MSI-X, there is
> zero guarantee that the allocated interrupts will be in a contiguous
> space.
>
> Given that, what is dev->irq good for? "Absolutely Nothing! (say it
> again!)".
>

The only thing is that dev->irq is an sysfs ABI to userspace. Due to
the inconsistency
between legacy PCI INTx, MSI, MSI-X, this ABI should have been
absolutely broken nowadays.
This is actually what the patchset was originally aiming at to fix.

One more question from me is that does dev->irq actually hold any
valid hardware INTx
information while hardware is using MSI-X? At least in my hardware,
sysfs ABI for PCI is all "0".

root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# cat irq
0

root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# ls -l msi_irqs/*
-r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/499
-r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/500
-r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/501
...
root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# cat msi_irqs/499
msix

Not quite sure how it is going on different hardware platforms.

> MSI-X is not something you can "accidentally" use. You have to
> actively embrace it. In all honesty, this patch tries to move in the
> wrong direction. If anything, we should kill this hack altogether and
> fix the (handful of?) drivers that rely on it. That'd actually be a
> good way to find whether they are still worth keeping in the tree. And
> if it breaks too many of them, then at least we'll know where we
> stand.
>
> I'd be tempted to leave the below patch simmer in -next for a few
> weeks and see if how many people shout:

This looks like a more proper direction to go.
but here i am wondering how sysfs ABI document should follow the below change
doc is patch 2/2:
https://lore.kernel.org/lkml/20210820223744.8439-3-21cnbao@gmail.com/

On the other hand, my feeling is that nobody should depend on sysfs
irq entry nowadays.
For example, userspace irqbalance is actually using /sys/devices/.../msi_irqs/
So probably we should set this ABI invisible when devices are using
MSI or MSI-X?

>
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index e5e75331b415..2be9a01cbe72 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -591,7 +591,6 @@ msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd)
>         entry->msi_attrib.is_virtual    = 0;
>         entry->msi_attrib.entry_nr      = 0;
>         entry->msi_attrib.maskbit       = !!(control & PCI_MSI_FLAGS_MASKBIT);
> -       entry->msi_attrib.default_irq   = dev->irq;     /* Save IOAPIC IRQ */
>         entry->msi_attrib.multi_cap     = (control & PCI_MSI_FLAGS_QMASK) >> 1;
>         entry->msi_attrib.multiple      = ilog2(__roundup_pow_of_two(nvec));
>
> @@ -682,7 +681,6 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
>         dev->msi_enabled = 1;
>
>         pcibios_free_irq(dev);
> -       dev->irq = entry->irq;
>         return 0;
>  }
>
> @@ -742,7 +740,6 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base,
>                 entry->msi_attrib.is_virtual =
>                         entry->msi_attrib.entry_nr >= vec_count;
>
> -               entry->msi_attrib.default_irq   = dev->irq;
>                 entry->mask_base                = base;
>
>                 addr = pci_msix_desc_addr(entry);
> @@ -964,8 +961,6 @@ static void pci_msi_shutdown(struct pci_dev *dev)
>         mask = msi_mask(desc->msi_attrib.multi_cap);
>         msi_mask_irq(desc, mask, 0);
>
> -       /* Restore dev->irq to its default pin-assertion IRQ */
> -       dev->irq = desc->msi_attrib.default_irq;
>         pcibios_alloc_irq(dev);
>  }
>
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index e8bdcb83172b..a631664c1c38 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -114,7 +114,6 @@ struct ti_sci_inta_msi_desc {
>   * @maskbit:   [PCI MSI/X] Mask-Pending bit supported?
>   * @is_64:     [PCI MSI/X] Address size: 0=32bit 1=64bit
>   * @entry_nr:  [PCI MSI/X] Entry which is described by this descriptor
> - * @default_irq:[PCI MSI/X] The default pre-assigned non-MSI irq
>   * @mask_pos:  [PCI MSI]   Mask register position
>   * @mask_base: [PCI MSI-X] Mask register base address
>   * @platform:  [platform]  Platform device specific msi descriptor data
> @@ -148,7 +147,6 @@ struct msi_desc {
>                                 u8      is_64           : 1;
>                                 u8      is_virtual      : 1;
>                                 u16     entry_nr;
> -                               unsigned default_irq;
>                         } msi_attrib;
>                         union {
>                                 u8      mask_pos;
>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

Thanks
barry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X
  2021-08-21 22:14       ` Barry Song
@ 2021-08-21 22:41         ` Barry Song
  2021-08-23 10:33           ` Marc Zyngier
  2021-08-24 19:25           ` Bjorn Helgaas
  2021-08-23 10:30         ` Marc Zyngier
  1 sibling, 2 replies; 19+ messages in thread
From: Barry Song @ 2021-08-21 22:41 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Bjorn Helgaas, Bjorn Helgaas, Jonathan Corbet, Jonathan.Cameron,
	bilbao, Greg Kroah-Hartman, leon, LKML, linux-pci, Linuxarm,
	luzmaximilian, mchehab+huawei, schnelle, Barry Song,
	Thomas Gleixner

On Sun, Aug 22, 2021 at 10:14 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Sat, Aug 21, 2021 at 10:42 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > Hi Bjorn,
> >
> > On Sat, 21 Aug 2021 00:33:28 +0100,
> > Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >
> > > [+cc Thomas, Marc]
> > >
> > > On Sat, Aug 21, 2021 at 10:37:43AM +1200, Barry Song wrote:
> > > > From: Barry Song <song.bao.hua@hisilicon.com>
> > > >
> > > > /sys/bus/pci/devices/.../irq sysfs ABI is very confusing at this
> > > > moment especially for MSI-X cases.
> > >
> > > AFAICT this patch *only* affects MSI-X.  So are you saying the sysfs
> > > ABI is fine for MSI but confusing for MSI-X?
> > >
> > > > While MSI sets IRQ to the first
> > > > number in the vector, MSI-X does nothing for this though it saves
> > > > default_irq in msix_setup_entries(). Weird the saved default_irq
> > > > for MSI-X is never used in pci_msix_shutdown(), which is quite
> > > > different with pci_msi_shutdown(). Thus, this patch moves to show
> > > > the first IRQ number which is from the first msi_entry for MSI-X.
> > > > Hopefully, this can make IRQ ABI more clear and more consistent.
> > > >
> > > > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> > > > ---
> > > >  drivers/pci/msi.c | 6 ++++++
> > > >  1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > > > index 9232255..6bbf81b 100644
> > > > --- a/drivers/pci/msi.c
> > > > +++ b/drivers/pci/msi.c
> > > > @@ -771,6 +771,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> > > >     int ret;
> > > >     u16 control;
> > > >     void __iomem *base;
> > > > +   struct msi_desc *desc;
> > > >
> > > >     /* Ensure MSI-X is disabled while it is set up */
> > > >     pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
> > > > @@ -814,6 +815,10 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> > > >     pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
> > > >
> > > >     pcibios_free_irq(dev);
> > > > +
> > > > +   desc = first_pci_msi_entry(dev);
> > > > +   dev->irq = desc->irq;
> > >
> > > This change is not primarily about sysfs.  This is about changing
> > > "dev->irq" when MSI-X is enabled, and it's only incidental that sysfs
> > > reflects that.
> > >
> > > So we need to know the effect of changing dev->irq.  Drivers may use
> > > the value of dev->irq, and I'm *guessing* this change shouldn't break
> > > them since we already do this for MSI, but I'd like some more expert
> > > opinion than mine :)
> > >
> > > For MSI we have:
> > >
> > >   msi_capability_init
> > >     msi_setup_entry
> > >       entry = alloc_msi_entry(nvec)
> > >       entry->msi_attrib.default_irq = dev->irq;     /* Save IOAPIC IRQ */
> > >     dev->irq = entry->irq;
> > >
> > >   pci_msi_shutdown
> > >     /* Restore dev->irq to its default pin-assertion IRQ */
> > >     dev->irq = desc->msi_attrib.default_irq;
> > >
> > > and for MSI-X we have:
> > >
> > >   msix_capability_init
> > >     msix_setup_entries
> > >       for (i = 0; i < nvec; i++)
> > >         entry = alloc_msi_entry(1)
> > >       entry->msi_attrib.default_irq = dev->irq;
> > >
> > >   pci_msix_shutdown
> > >     for_each_pci_msi_entry(entry, dev)
> > >       __pci_msix_desc_mask_irq
> > > +   dev->irq = entry->msi_attrib.default_irq;   # added by this patch
> > >
> > >
> > > Things that seem strange to me:
> > >
> > >   - The msi_setup_entry() comment "Save IOAPIC IRQ" seems needlessly
> > >     specific; maybe it should be "INTx IRQ".
> > >
> > >   - The pci_msi_shutdown() comment "Restore ... pin-assertion IRQ"
> > >     should match the msi_setup_entry() one, e.g., maybe it should also
> > >     be "INTx IRQ".  There are no INTx or IOAPIC pins in PCIe.
> > >
> > >   - The only use of .default_irq is to save and restore dev->irq, so
> > >     it looks like a per-device thing, not a per-vector thing.
> > >
> > >     In msi_setup_entry() there's only one msi_entry, so there's only
> > >     one saved .default_irq.
> > >
> > >     In msix_setup_entries(), we get nvecs msi_entry structs, and we
> > >     get a saved .default_irq in each one?
> >
> > That's a key point.
> >
> > Old-school PCI/MSI is represented by a single interrupt, and you
> > *could* somehow make it relatively easy for drivers that only
> > understand INTx to migrate to MSI if you replaced whatever is held in
> > dev->irq (which should only represent the INTx mapping) with the MSI
> > interrupt number. Which I guess is what the MSI code is doing.
> >
> > This is the 21st century, and nobody should ever rely on such horror,
> > but I'm sure we do have such drivers in the tree. Boo.
> >
> > However, this *cannot* hold true for Multi-MSI, nor MSI-X, because
> > there is a plurality of interrupts. Even worse, for MSI-X, there is
> > zero guarantee that the allocated interrupts will be in a contiguous
> > space.
> >
> > Given that, what is dev->irq good for? "Absolutely Nothing! (say it
> > again!)".
> >
>
> The only thing is that dev->irq is an sysfs ABI to userspace. Due to
> the inconsistency
> between legacy PCI INTx, MSI, MSI-X, this ABI should have been
> absolutely broken nowadays.
> This is actually what the patchset was originally aiming at to fix.
>
> One more question from me is that does dev->irq actually hold any
> valid hardware INTx
> information while hardware is using MSI-X? At least in my hardware,
> sysfs ABI for PCI is all "0".
>
> root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# cat irq
> 0
>
> root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# ls -l msi_irqs/*
> -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/499
> -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/500
> -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/501
> ...
> root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# cat msi_irqs/499
> msix
>
> Not quite sure how it is going on different hardware platforms.
>
> > MSI-X is not something you can "accidentally" use. You have to
> > actively embrace it. In all honesty, this patch tries to move in the
> > wrong direction. If anything, we should kill this hack altogether and
> > fix the (handful of?) drivers that rely on it. That'd actually be a
> > good way to find whether they are still worth keeping in the tree. And
> > if it breaks too many of them, then at least we'll know where we
> > stand.
> >
> > I'd be tempted to leave the below patch simmer in -next for a few
> > weeks and see if how many people shout:
>
> This looks like a more proper direction to go.
> but here i am wondering how sysfs ABI document should follow the below change
> doc is patch 2/2:
> https://lore.kernel.org/lkml/20210820223744.8439-3-21cnbao@gmail.com/
>
> On the other hand, my feeling is that nobody should depend on sysfs
> irq entry nowadays.
> For example, userspace irqbalance is actually using /sys/devices/.../msi_irqs/
> So probably we should set this ABI invisible when devices are using
> MSI or MSI-X?

i mean something like the below,

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 5d63df7..1323841 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -26,6 +26,7 @@
 #include <linux/slab.h>
 #include <linux/vgaarb.h>
 #include <linux/pm_runtime.h>
+#include <linux/msi.h>
 #include <linux/of.h>
 #include "pci.h"

@@ -1437,6 +1438,16 @@ static umode_t pci_dev_attrs_are_visible(struct
kobject *kobj,
                if ((pdev->class >> 8) != PCI_CLASS_DISPLAY_VGA)
                        return 0;

+#ifdef CONFIG_PCI_MSI
+       /*
+        * if devices are MSI and MSI-X, IRQ sysfs ABI is meaningless
+        * and broken
+        */
+       if (a == &dev_attr_irq.attr)
+               if (first_pci_msi_entry(pdev))
+                       return 0;
+#endif
+
        return a->mode;
 }

>
> >
> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > index e5e75331b415..2be9a01cbe72 100644
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -591,7 +591,6 @@ msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd)
> >         entry->msi_attrib.is_virtual    = 0;
> >         entry->msi_attrib.entry_nr      = 0;
> >         entry->msi_attrib.maskbit       = !!(control & PCI_MSI_FLAGS_MASKBIT);
> > -       entry->msi_attrib.default_irq   = dev->irq;     /* Save IOAPIC IRQ */
> >         entry->msi_attrib.multi_cap     = (control & PCI_MSI_FLAGS_QMASK) >> 1;
> >         entry->msi_attrib.multiple      = ilog2(__roundup_pow_of_two(nvec));
> >
> > @@ -682,7 +681,6 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
> >         dev->msi_enabled = 1;
> >
> >         pcibios_free_irq(dev);
> > -       dev->irq = entry->irq;
> >         return 0;
> >  }
> >
> > @@ -742,7 +740,6 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base,
> >                 entry->msi_attrib.is_virtual =
> >                         entry->msi_attrib.entry_nr >= vec_count;
> >
> > -               entry->msi_attrib.default_irq   = dev->irq;
> >                 entry->mask_base                = base;
> >
> >                 addr = pci_msix_desc_addr(entry);
> > @@ -964,8 +961,6 @@ static void pci_msi_shutdown(struct pci_dev *dev)
> >         mask = msi_mask(desc->msi_attrib.multi_cap);
> >         msi_mask_irq(desc, mask, 0);
> >
> > -       /* Restore dev->irq to its default pin-assertion IRQ */
> > -       dev->irq = desc->msi_attrib.default_irq;
> >         pcibios_alloc_irq(dev);
> >  }
> >
> > diff --git a/include/linux/msi.h b/include/linux/msi.h
> > index e8bdcb83172b..a631664c1c38 100644
> > --- a/include/linux/msi.h
> > +++ b/include/linux/msi.h
> > @@ -114,7 +114,6 @@ struct ti_sci_inta_msi_desc {
> >   * @maskbit:   [PCI MSI/X] Mask-Pending bit supported?
> >   * @is_64:     [PCI MSI/X] Address size: 0=32bit 1=64bit
> >   * @entry_nr:  [PCI MSI/X] Entry which is described by this descriptor
> > - * @default_irq:[PCI MSI/X] The default pre-assigned non-MSI irq
> >   * @mask_pos:  [PCI MSI]   Mask register position
> >   * @mask_base: [PCI MSI-X] Mask register base address
> >   * @platform:  [platform]  Platform device specific msi descriptor data
> > @@ -148,7 +147,6 @@ struct msi_desc {
> >                                 u8      is_64           : 1;
> >                                 u8      is_virtual      : 1;
> >                                 u16     entry_nr;
> > -                               unsigned default_irq;
> >                         } msi_attrib;
> >                         union {
> >                                 u8      mask_pos;
> >
> > Thanks,
> >
> >         M.
> >
> > --
> > Without deviation from the norm, progress is not possible.
>

Thanks
barry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X
  2021-08-21 22:14       ` Barry Song
  2021-08-21 22:41         ` Barry Song
@ 2021-08-23 10:30         ` Marc Zyngier
  2021-08-23 11:03           ` Barry Song
  1 sibling, 1 reply; 19+ messages in thread
From: Marc Zyngier @ 2021-08-23 10:30 UTC (permalink / raw)
  To: Barry Song
  Cc: Bjorn Helgaas, Bjorn Helgaas, Jonathan Corbet, Jonathan.Cameron,
	bilbao, Greg Kroah-Hartman, leon, LKML, linux-pci, Linuxarm,
	luzmaximilian, mchehab+huawei, schnelle, Barry Song,
	Thomas Gleixner

On Sat, 21 Aug 2021 23:14:35 +0100,
Barry Song <21cnbao@gmail.com> wrote:
> 
> On Sat, Aug 21, 2021 at 10:42 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > Hi Bjorn,
> >
> > On Sat, 21 Aug 2021 00:33:28 +0100,
> > Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >

[...]

> > >     In msix_setup_entries(), we get nvecs msi_entry structs, and we
> > >     get a saved .default_irq in each one?
> >
> > That's a key point.
> >
> > Old-school PCI/MSI is represented by a single interrupt, and you
> > *could* somehow make it relatively easy for drivers that only
> > understand INTx to migrate to MSI if you replaced whatever is held in
> > dev->irq (which should only represent the INTx mapping) with the MSI
> > interrupt number. Which I guess is what the MSI code is doing.
> >
> > This is the 21st century, and nobody should ever rely on such horror,
> > but I'm sure we do have such drivers in the tree. Boo.
> >
> > However, this *cannot* hold true for Multi-MSI, nor MSI-X, because
> > there is a plurality of interrupts. Even worse, for MSI-X, there is
> > zero guarantee that the allocated interrupts will be in a contiguous
> > space.
> >
> > Given that, what is dev->irq good for? "Absolutely Nothing! (say it
> > again!)".
> >
> 
> The only thing is that dev->irq is an sysfs ABI to userspace. Due to
> the inconsistency between legacy PCI INTx, MSI, MSI-X, this ABI
> should have been absolutely broken nowadays.  This is actually what
> the patchset was originally aiming at to fix.

I do not think we should expose more of a broken abstraction to
userspace. We will have to carry on exposing the first MSI in this
field forever, but it doesn't mean we should have to do it for MSI-X.

> One more question from me is that does dev->irq actually hold any
> valid hardware INTx information while hardware is using MSI-X? At
> least in my hardware, sysfs ABI for PCI is all "0".

That's probably because nothing actually configured the interrupt, or
that there is no INTx implementation. I have that on systems with
pretty dodgy (or incomplete) firmware.

> root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# cat irq
> 0
> 
> root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# ls -l msi_irqs/*
> -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/499
> -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/500
> -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/501
> ...
> root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# cat msi_irqs/499
> msix
> 
> Not quite sure how it is going on different hardware platforms.

My D05 does that as well, and it doesn't expose any INTx support.

> 
> > MSI-X is not something you can "accidentally" use. You have to
> > actively embrace it. In all honesty, this patch tries to move in the
> > wrong direction. If anything, we should kill this hack altogether and
> > fix the (handful of?) drivers that rely on it. That'd actually be a
> > good way to find whether they are still worth keeping in the tree. And
> > if it breaks too many of them, then at least we'll know where we
> > stand.
> >
> > I'd be tempted to leave the below patch simmer in -next for a few
> > weeks and see if how many people shout:
> 
> This looks like a more proper direction to go.
> but here i am wondering how sysfs ABI document should follow the below change
> doc is patch 2/2:
> https://lore.kernel.org/lkml/20210820223744.8439-3-21cnbao@gmail.com/
> 
> On the other hand, my feeling is that nobody should depend on sysfs
> irq entry nowadays.

Too late. It is there, and we need to preserve it. I just don't think
feeding it more erroneous information is the right thing to do.

My patch was only dealing with the kernel side of things, not the
userspace ABI. That ABI should be carried on unchanged.


> For example, userspace irqbalance is actually using
> /sys/devices/.../msi_irqs/ So probably we should set this ABI
> invisible when devices are using MSI or MSI-X?

Can it actually be made optional? I don't believe we can.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X
  2021-08-21 22:41         ` Barry Song
@ 2021-08-23 10:33           ` Marc Zyngier
  2021-08-24 19:25           ` Bjorn Helgaas
  1 sibling, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2021-08-23 10:33 UTC (permalink / raw)
  To: Barry Song
  Cc: Bjorn Helgaas, Bjorn Helgaas, Jonathan Corbet, Jonathan.Cameron,
	bilbao, Greg Kroah-Hartman, leon, LKML, linux-pci, Linuxarm,
	luzmaximilian, mchehab+huawei, schnelle, Barry Song,
	Thomas Gleixner

On Sat, 21 Aug 2021 23:41:17 +0100,
Barry Song <21cnbao@gmail.com> wrote:

[...]

> > So probably we should set this ABI invisible when devices are using
> > MSI or MSI-X?
> 
> i mean something like the below,
> 
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 5d63df7..1323841 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -26,6 +26,7 @@
>  #include <linux/slab.h>
>  #include <linux/vgaarb.h>
>  #include <linux/pm_runtime.h>
> +#include <linux/msi.h>
>  #include <linux/of.h>
>  #include "pci.h"
> 
> @@ -1437,6 +1438,16 @@ static umode_t pci_dev_attrs_are_visible(struct
> kobject *kobj,
>                 if ((pdev->class >> 8) != PCI_CLASS_DISPLAY_VGA)
>                         return 0;
> 
> +#ifdef CONFIG_PCI_MSI
> +       /*
> +        * if devices are MSI and MSI-X, IRQ sysfs ABI is meaningless
> +        * and broken
> +        */
> +       if (a == &dev_attr_irq.attr)
> +               if (first_pci_msi_entry(pdev))
> +                       return 0;
> +#endif
> +
>         return a->mode;
>  }

I don't think you can break what we have today. Whatever change we
make to the kernel internals, the userspace view must stay unchanged.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X
  2021-08-23 10:30         ` Marc Zyngier
@ 2021-08-23 11:03           ` Barry Song
  2021-08-23 11:28             ` Marc Zyngier
  0 siblings, 1 reply; 19+ messages in thread
From: Barry Song @ 2021-08-23 11:03 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Bjorn Helgaas, Bjorn Helgaas, Jonathan Corbet, Jonathan.Cameron,
	bilbao, Greg Kroah-Hartman, leon, LKML, linux-pci, Linuxarm,
	luzmaximilian, mchehab+huawei, schnelle, Barry Song,
	Thomas Gleixner

On Mon, Aug 23, 2021 at 10:30 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On Sat, 21 Aug 2021 23:14:35 +0100,
> Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Sat, Aug 21, 2021 at 10:42 PM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > Hi Bjorn,
> > >
> > > On Sat, 21 Aug 2021 00:33:28 +0100,
> > > Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > >
>
> [...]
>
> > > >     In msix_setup_entries(), we get nvecs msi_entry structs, and we
> > > >     get a saved .default_irq in each one?
> > >
> > > That's a key point.
> > >
> > > Old-school PCI/MSI is represented by a single interrupt, and you
> > > *could* somehow make it relatively easy for drivers that only
> > > understand INTx to migrate to MSI if you replaced whatever is held in
> > > dev->irq (which should only represent the INTx mapping) with the MSI
> > > interrupt number. Which I guess is what the MSI code is doing.
> > >
> > > This is the 21st century, and nobody should ever rely on such horror,
> > > but I'm sure we do have such drivers in the tree. Boo.
> > >
> > > However, this *cannot* hold true for Multi-MSI, nor MSI-X, because
> > > there is a plurality of interrupts. Even worse, for MSI-X, there is
> > > zero guarantee that the allocated interrupts will be in a contiguous
> > > space.
> > >
> > > Given that, what is dev->irq good for? "Absolutely Nothing! (say it
> > > again!)".
> > >
> >
> > The only thing is that dev->irq is an sysfs ABI to userspace. Due to
> > the inconsistency between legacy PCI INTx, MSI, MSI-X, this ABI
> > should have been absolutely broken nowadays.  This is actually what
> > the patchset was originally aiming at to fix.
>
> I do not think we should expose more of a broken abstraction to
> userspace. We will have to carry on exposing the first MSI in this
> field forever, but it doesn't mean we should have to do it for MSI-X.
>
> > One more question from me is that does dev->irq actually hold any
> > valid hardware INTx information while hardware is using MSI-X? At
> > least in my hardware, sysfs ABI for PCI is all "0".
>
> That's probably because nothing actually configured the interrupt, or
> that there is no INTx implementation. I have that on systems with
> pretty dodgy (or incomplete) firmware.
>
> > root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# cat irq
> > 0
> >
> > root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# ls -l msi_irqs/*
> > -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/499
> > -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/500
> > -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/501
> > ...
> > root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# cat msi_irqs/499
> > msix
> >
> > Not quite sure how it is going on different hardware platforms.
>
> My D05 does that as well, and it doesn't expose any INTx support.
>
> >
> > > MSI-X is not something you can "accidentally" use. You have to
> > > actively embrace it. In all honesty, this patch tries to move in the
> > > wrong direction. If anything, we should kill this hack altogether and
> > > fix the (handful of?) drivers that rely on it. That'd actually be a
> > > good way to find whether they are still worth keeping in the tree. And
> > > if it breaks too many of them, then at least we'll know where we
> > > stand.
> > >
> > > I'd be tempted to leave the below patch simmer in -next for a few
> > > weeks and see if how many people shout:
> >
> > This looks like a more proper direction to go.
> > but here i am wondering how sysfs ABI document should follow the below change
> > doc is patch 2/2:
> > https://lore.kernel.org/lkml/20210820223744.8439-3-21cnbao@gmail.com/
> >
> > On the other hand, my feeling is that nobody should depend on sysfs
> > irq entry nowadays.
>
> Too late. It is there, and we need to preserve it. I just don't think
> feeding it more erroneous information is the right thing to do.
>
> My patch was only dealing with the kernel side of things, not the
> userspace ABI. That ABI should be carried on unchanged.

it seems this isn't true. your patch is also changing userspace ABI as
long as you change pci_dev->irq
which will be shown in sysfs irq entry.

if we don't want to change the behaviour of any existing ABI, it seems
the only thing we can do here
to document it well in ABI doc. i actually doubt anyone has really
understood what the irq entry
is really showing.

>
>
> > For example, userspace irqbalance is actually using
> > /sys/devices/.../msi_irqs/ So probably we should set this ABI
> > invisible when devices are using MSI or MSI-X?
>
> Can it actually be made optional? I don't believe we can.
>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

Thanks
barry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X
  2021-08-23 11:03           ` Barry Song
@ 2021-08-23 11:28             ` Marc Zyngier
  2021-08-23 22:46               ` Barry Song
  0 siblings, 1 reply; 19+ messages in thread
From: Marc Zyngier @ 2021-08-23 11:28 UTC (permalink / raw)
  To: Barry Song
  Cc: Bjorn Helgaas, Bjorn Helgaas, Jonathan Corbet, Jonathan.Cameron,
	bilbao, Greg Kroah-Hartman, leon, LKML, linux-pci, Linuxarm,
	luzmaximilian, mchehab+huawei, schnelle, Barry Song,
	Thomas Gleixner

On Mon, 23 Aug 2021 12:03:08 +0100,
Barry Song <21cnbao@gmail.com> wrote:
> 
> On Mon, Aug 23, 2021 at 10:30 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Sat, 21 Aug 2021 23:14:35 +0100,
> > Barry Song <21cnbao@gmail.com> wrote:
> > >
> > > On Sat, Aug 21, 2021 at 10:42 PM Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > Hi Bjorn,
> > > >
> > > > On Sat, 21 Aug 2021 00:33:28 +0100,
> > > > Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > >
> >
> > [...]
> >
> > > > >     In msix_setup_entries(), we get nvecs msi_entry structs, and we
> > > > >     get a saved .default_irq in each one?
> > > >
> > > > That's a key point.
> > > >
> > > > Old-school PCI/MSI is represented by a single interrupt, and you
> > > > *could* somehow make it relatively easy for drivers that only
> > > > understand INTx to migrate to MSI if you replaced whatever is held in
> > > > dev->irq (which should only represent the INTx mapping) with the MSI
> > > > interrupt number. Which I guess is what the MSI code is doing.
> > > >
> > > > This is the 21st century, and nobody should ever rely on such horror,
> > > > but I'm sure we do have such drivers in the tree. Boo.
> > > >
> > > > However, this *cannot* hold true for Multi-MSI, nor MSI-X, because
> > > > there is a plurality of interrupts. Even worse, for MSI-X, there is
> > > > zero guarantee that the allocated interrupts will be in a contiguous
> > > > space.
> > > >
> > > > Given that, what is dev->irq good for? "Absolutely Nothing! (say it
> > > > again!)".
> > > >
> > >
> > > The only thing is that dev->irq is an sysfs ABI to userspace. Due to
> > > the inconsistency between legacy PCI INTx, MSI, MSI-X, this ABI
> > > should have been absolutely broken nowadays.  This is actually what
> > > the patchset was originally aiming at to fix.
> >
> > I do not think we should expose more of a broken abstraction to
> > userspace. We will have to carry on exposing the first MSI in this
> > field forever, but it doesn't mean we should have to do it for MSI-X.
> >
> > > One more question from me is that does dev->irq actually hold any
> > > valid hardware INTx information while hardware is using MSI-X? At
> > > least in my hardware, sysfs ABI for PCI is all "0".
> >
> > That's probably because nothing actually configured the interrupt, or
> > that there is no INTx implementation. I have that on systems with
> > pretty dodgy (or incomplete) firmware.
> >
> > > root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# cat irq
> > > 0
> > >
> > > root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# ls -l msi_irqs/*
> > > -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/499
> > > -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/500
> > > -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/501
> > > ...
> > > root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# cat msi_irqs/499
> > > msix
> > >
> > > Not quite sure how it is going on different hardware platforms.
> >
> > My D05 does that as well, and it doesn't expose any INTx support.
> >
> > >
> > > > MSI-X is not something you can "accidentally" use. You have to
> > > > actively embrace it. In all honesty, this patch tries to move in the
> > > > wrong direction. If anything, we should kill this hack altogether and
> > > > fix the (handful of?) drivers that rely on it. That'd actually be a
> > > > good way to find whether they are still worth keeping in the tree. And
> > > > if it breaks too many of them, then at least we'll know where we
> > > > stand.
> > > >
> > > > I'd be tempted to leave the below patch simmer in -next for a few
> > > > weeks and see if how many people shout:
> > >
> > > This looks like a more proper direction to go.
> > > but here i am wondering how sysfs ABI document should follow the below change
> > > doc is patch 2/2:
> > > https://lore.kernel.org/lkml/20210820223744.8439-3-21cnbao@gmail.com/
> > >
> > > On the other hand, my feeling is that nobody should depend on sysfs
> > > irq entry nowadays.
> >
> > Too late. It is there, and we need to preserve it. I just don't think
> > feeding it more erroneous information is the right thing to do.
> >
> > My patch was only dealing with the kernel side of things, not the
> > userspace ABI. That ABI should be carried on unchanged.
> 
> it seems this isn't true. your patch is also changing userspace ABI
> as long as you change pci_dev->irq which will be shown in sysfs irq
> entry.

I guess I wasn't clear enough above. Let me rephrase this:

My patch was only dealing with the kernel side of things, not the
userspace ABI. That ABI should be carried on unchanged, which requires
additional changes in the sysfs code.

> if we don't want to change the behaviour of any existing ABI, it
> seems the only thing we can do here to document it well in ABI
> doc. i actually doubt anyone has really understood what the irq
> entry is really showing.

Given that we can't prove that it is actually the case, I believe this
is the only option.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X
  2021-08-23 11:28             ` Marc Zyngier
@ 2021-08-23 22:46               ` Barry Song
  2021-08-24 19:34                 ` Bjorn Helgaas
  0 siblings, 1 reply; 19+ messages in thread
From: Barry Song @ 2021-08-23 22:46 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Bjorn Helgaas, Bjorn Helgaas, Jonathan Corbet, Jonathan.Cameron,
	bilbao, Greg Kroah-Hartman, leon, LKML, linux-pci, Linuxarm,
	luzmaximilian, mchehab+huawei, schnelle, Barry Song,
	Thomas Gleixner

On Mon, Aug 23, 2021 at 11:28 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On Mon, 23 Aug 2021 12:03:08 +0100,
> Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Mon, Aug 23, 2021 at 10:30 PM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Sat, 21 Aug 2021 23:14:35 +0100,
> > > Barry Song <21cnbao@gmail.com> wrote:
> > > >
> > > > On Sat, Aug 21, 2021 at 10:42 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > >
> > > > > Hi Bjorn,
> > > > >
> > > > > On Sat, 21 Aug 2021 00:33:28 +0100,
> > > > > Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > >
> > >
> > > [...]
> > >
> > > > > >     In msix_setup_entries(), we get nvecs msi_entry structs, and we
> > > > > >     get a saved .default_irq in each one?
> > > > >
> > > > > That's a key point.
> > > > >
> > > > > Old-school PCI/MSI is represented by a single interrupt, and you
> > > > > *could* somehow make it relatively easy for drivers that only
> > > > > understand INTx to migrate to MSI if you replaced whatever is held in
> > > > > dev->irq (which should only represent the INTx mapping) with the MSI
> > > > > interrupt number. Which I guess is what the MSI code is doing.
> > > > >
> > > > > This is the 21st century, and nobody should ever rely on such horror,
> > > > > but I'm sure we do have such drivers in the tree. Boo.
> > > > >
> > > > > However, this *cannot* hold true for Multi-MSI, nor MSI-X, because
> > > > > there is a plurality of interrupts. Even worse, for MSI-X, there is
> > > > > zero guarantee that the allocated interrupts will be in a contiguous
> > > > > space.
> > > > >
> > > > > Given that, what is dev->irq good for? "Absolutely Nothing! (say it
> > > > > again!)".
> > > > >
> > > >
> > > > The only thing is that dev->irq is an sysfs ABI to userspace. Due to
> > > > the inconsistency between legacy PCI INTx, MSI, MSI-X, this ABI
> > > > should have been absolutely broken nowadays.  This is actually what
> > > > the patchset was originally aiming at to fix.
> > >
> > > I do not think we should expose more of a broken abstraction to
> > > userspace. We will have to carry on exposing the first MSI in this
> > > field forever, but it doesn't mean we should have to do it for MSI-X.
> > >
> > > > One more question from me is that does dev->irq actually hold any
> > > > valid hardware INTx information while hardware is using MSI-X? At
> > > > least in my hardware, sysfs ABI for PCI is all "0".
> > >
> > > That's probably because nothing actually configured the interrupt, or
> > > that there is no INTx implementation. I have that on systems with
> > > pretty dodgy (or incomplete) firmware.
> > >
> > > > root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# cat irq
> > > > 0
> > > >
> > > > root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# ls -l msi_irqs/*
> > > > -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/499
> > > > -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/500
> > > > -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/501
> > > > ...
> > > > root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# cat msi_irqs/499
> > > > msix
> > > >
> > > > Not quite sure how it is going on different hardware platforms.
> > >
> > > My D05 does that as well, and it doesn't expose any INTx support.
> > >
> > > >
> > > > > MSI-X is not something you can "accidentally" use. You have to
> > > > > actively embrace it. In all honesty, this patch tries to move in the
> > > > > wrong direction. If anything, we should kill this hack altogether and
> > > > > fix the (handful of?) drivers that rely on it. That'd actually be a
> > > > > good way to find whether they are still worth keeping in the tree. And
> > > > > if it breaks too many of them, then at least we'll know where we
> > > > > stand.
> > > > >
> > > > > I'd be tempted to leave the below patch simmer in -next for a few
> > > > > weeks and see if how many people shout:
> > > >
> > > > This looks like a more proper direction to go.
> > > > but here i am wondering how sysfs ABI document should follow the below change
> > > > doc is patch 2/2:
> > > > https://lore.kernel.org/lkml/20210820223744.8439-3-21cnbao@gmail.com/
> > > >
> > > > On the other hand, my feeling is that nobody should depend on sysfs
> > > > irq entry nowadays.
> > >
> > > Too late. It is there, and we need to preserve it. I just don't think
> > > feeding it more erroneous information is the right thing to do.
> > >
> > > My patch was only dealing with the kernel side of things, not the
> > > userspace ABI. That ABI should be carried on unchanged.
> >
> > it seems this isn't true. your patch is also changing userspace ABI
> > as long as you change pci_dev->irq which will be shown in sysfs irq
> > entry.
>
> I guess I wasn't clear enough above. Let me rephrase this:
>
> My patch was only dealing with the kernel side of things, not the
> userspace ABI. That ABI should be carried on unchanged, which requires
> additional changes in the sysfs code.

Thanks for clarification. It seems you mean something like below, am I right?

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 5d63df7..0fa7a16 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -26,6 +26,7 @@
 #include <linux/slab.h>
 #include <linux/vgaarb.h>
 #include <linux/pm_runtime.h>
+#include <linux/msi.h>
 #include <linux/of.h>
 #include "pci.h"

@@ -49,7 +50,23 @@ static DEVICE_ATTR_RO(field)
 pci_config_attr(subsystem_device, "0x%04x\n");
 pci_config_attr(revision, "0x%02x\n");
 pci_config_attr(class, "0x%06x\n");
-pci_config_attr(irq, "%u\n");
+
+static ssize_t irq_show(struct device *dev,
+                                        struct device_attribute *attr,
+                                        char *buf)
+{
+       struct pci_dev *pdev = to_pci_dev(dev);
+#ifdef CONFIG_PCI_MSI
+       struct msi_desc *desc = first_pci_msi_entry(pdev);
+
+       /* for MSI, return the 1st IRQ in IRQ vector */
+       if (desc && !desc->msi_attrib.is_msix)
+               return sysfs_emit(buf, "%u\n", desc->irq);
+#endif
+
+       return sysfs_emit(buf, "%u\n", pdev->irq);
+}
+static DEVICE_ATTR_RO(irq);

 static ssize_t broken_parity_status_show(struct device *dev,
                                         struct device_attribute *attr,


>
> > if we don't want to change the behaviour of any existing ABI, it
> > seems the only thing we can do here to document it well in ABI
> > doc. i actually doubt anyone has really understood what the irq
> > entry is really showing.
>
> Given that we can't prove that it is actually the case, I believe this
> is the only option.

we have to document the ABI like below though it seems quite annoying.

1. for devices which don't support MSI and MSI-X, show legacy INTx
2. for devices which support MSI
    a. if CONFIG_PCI_MSI is not enabled,  show legacy INTx
    b. if CONFIG_PCI_MSI is enabled and devices are using MSI at this
moment, show 1st IRQ in the vector
    c. if CONFIG_PCI_MSI is enabled, but we shutdown its MSI before
the users call sysfs entry,
        so at this moment, devices are not using MSI,  show legacy INTx
3. for devices which support MSI-X, no matter if it is using MSI-X,
    show legacy INTx
4. In Addition, INTx might be broken due to incomplete firmware or
hardware design for MSI and MSI-X cases

To be honest, it sounds like a disaster :-) but if this is what we
have to do, I'd like to try it in v3.

>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

Thanks
Barry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X
  2021-08-21 22:41         ` Barry Song
  2021-08-23 10:33           ` Marc Zyngier
@ 2021-08-24 19:25           ` Bjorn Helgaas
  1 sibling, 0 replies; 19+ messages in thread
From: Bjorn Helgaas @ 2021-08-24 19:25 UTC (permalink / raw)
  To: Barry Song
  Cc: Marc Zyngier, Bjorn Helgaas, Jonathan Corbet, Jonathan.Cameron,
	bilbao, Greg Kroah-Hartman, leon, LKML, linux-pci, Linuxarm,
	luzmaximilian, mchehab+huawei, schnelle, Barry Song,
	Thomas Gleixner

On Sun, Aug 22, 2021 at 10:41:17AM +1200, Barry Song wrote:
> On Sun, Aug 22, 2021 at 10:14 AM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Sat, Aug 21, 2021 at 10:42 PM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > Hi Bjorn,
> > >
> > > On Sat, 21 Aug 2021 00:33:28 +0100,
> > > Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > >
> > > > [+cc Thomas, Marc]
> > > >
> > > > On Sat, Aug 21, 2021 at 10:37:43AM +1200, Barry Song wrote:
> > > > > From: Barry Song <song.bao.hua@hisilicon.com>
> > > > >
> > > > > /sys/bus/pci/devices/.../irq sysfs ABI is very confusing at this
> > > > > moment especially for MSI-X cases.
> > > >
> > > > AFAICT this patch *only* affects MSI-X.  So are you saying the sysfs
> > > > ABI is fine for MSI but confusing for MSI-X?
> > > >
> > > > > While MSI sets IRQ to the first
> > > > > number in the vector, MSI-X does nothing for this though it saves
> > > > > default_irq in msix_setup_entries(). Weird the saved default_irq
> > > > > for MSI-X is never used in pci_msix_shutdown(), which is quite
> > > > > different with pci_msi_shutdown(). Thus, this patch moves to show
> > > > > the first IRQ number which is from the first msi_entry for MSI-X.
> > > > > Hopefully, this can make IRQ ABI more clear and more consistent.
> > > > >
> > > > > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > > Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> > > > > ---
> > > > >  drivers/pci/msi.c | 6 ++++++
> > > > >  1 file changed, 6 insertions(+)
> > > > >
> > > > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > > > > index 9232255..6bbf81b 100644
> > > > > --- a/drivers/pci/msi.c
> > > > > +++ b/drivers/pci/msi.c
> > > > > @@ -771,6 +771,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> > > > >     int ret;
> > > > >     u16 control;
> > > > >     void __iomem *base;
> > > > > +   struct msi_desc *desc;
> > > > >
> > > > >     /* Ensure MSI-X is disabled while it is set up */
> > > > >     pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
> > > > > @@ -814,6 +815,10 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> > > > >     pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
> > > > >
> > > > >     pcibios_free_irq(dev);
> > > > > +
> > > > > +   desc = first_pci_msi_entry(dev);
> > > > > +   dev->irq = desc->irq;
> > > >
> > > > This change is not primarily about sysfs.  This is about changing
> > > > "dev->irq" when MSI-X is enabled, and it's only incidental that sysfs
> > > > reflects that.
> > > >
> > > > So we need to know the effect of changing dev->irq.  Drivers may use
> > > > the value of dev->irq, and I'm *guessing* this change shouldn't break
> > > > them since we already do this for MSI, but I'd like some more expert
> > > > opinion than mine :)
> > > >
> > > > For MSI we have:
> > > >
> > > >   msi_capability_init
> > > >     msi_setup_entry
> > > >       entry = alloc_msi_entry(nvec)
> > > >       entry->msi_attrib.default_irq = dev->irq;     /* Save IOAPIC IRQ */
> > > >     dev->irq = entry->irq;
> > > >
> > > >   pci_msi_shutdown
> > > >     /* Restore dev->irq to its default pin-assertion IRQ */
> > > >     dev->irq = desc->msi_attrib.default_irq;
> > > >
> > > > and for MSI-X we have:
> > > >
> > > >   msix_capability_init
> > > >     msix_setup_entries
> > > >       for (i = 0; i < nvec; i++)
> > > >         entry = alloc_msi_entry(1)
> > > >       entry->msi_attrib.default_irq = dev->irq;
> > > >
> > > >   pci_msix_shutdown
> > > >     for_each_pci_msi_entry(entry, dev)
> > > >       __pci_msix_desc_mask_irq
> > > > +   dev->irq = entry->msi_attrib.default_irq;   # added by this patch
> > > >
> > > >
> > > > Things that seem strange to me:
> > > >
> > > >   - The msi_setup_entry() comment "Save IOAPIC IRQ" seems needlessly
> > > >     specific; maybe it should be "INTx IRQ".
> > > >
> > > >   - The pci_msi_shutdown() comment "Restore ... pin-assertion IRQ"
> > > >     should match the msi_setup_entry() one, e.g., maybe it should also
> > > >     be "INTx IRQ".  There are no INTx or IOAPIC pins in PCIe.
> > > >
> > > >   - The only use of .default_irq is to save and restore dev->irq, so
> > > >     it looks like a per-device thing, not a per-vector thing.
> > > >
> > > >     In msi_setup_entry() there's only one msi_entry, so there's only
> > > >     one saved .default_irq.
> > > >
> > > >     In msix_setup_entries(), we get nvecs msi_entry structs, and we
> > > >     get a saved .default_irq in each one?
> > >
> > > That's a key point.
> > >
> > > Old-school PCI/MSI is represented by a single interrupt, and you
> > > *could* somehow make it relatively easy for drivers that only
> > > understand INTx to migrate to MSI if you replaced whatever is held in
> > > dev->irq (which should only represent the INTx mapping) with the MSI
> > > interrupt number. Which I guess is what the MSI code is doing.
> > >
> > > This is the 21st century, and nobody should ever rely on such horror,
> > > but I'm sure we do have such drivers in the tree. Boo.
> > >
> > > However, this *cannot* hold true for Multi-MSI, nor MSI-X, because
> > > there is a plurality of interrupts. Even worse, for MSI-X, there is
> > > zero guarantee that the allocated interrupts will be in a contiguous
> > > space.
> > >
> > > Given that, what is dev->irq good for? "Absolutely Nothing! (say it
> > > again!)".
> > >
> >
> > The only thing is that dev->irq is an sysfs ABI to userspace. Due to
> > the inconsistency
> > between legacy PCI INTx, MSI, MSI-X, this ABI should have been
> > absolutely broken nowadays.
> > This is actually what the patchset was originally aiming at to fix.
> >
> > One more question from me is that does dev->irq actually hold any
> > valid hardware INTx
> > information while hardware is using MSI-X? At least in my hardware,
> > sysfs ABI for PCI is all "0".
> >
> > root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# cat irq
> > 0
> >
> > root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# ls -l msi_irqs/*
> > -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/499
> > -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/500
> > -r--r--r-- 1 root root 4096 Aug 21 22:04 msi_irqs/501
> > ...
> > root@ubuntu:/sys/devices/pci0000:7c/0000:7c:00.0/0000:7d:00.3# cat msi_irqs/499
> > msix
> >
> > Not quite sure how it is going on different hardware platforms.
> >
> > > MSI-X is not something you can "accidentally" use. You have to
> > > actively embrace it. In all honesty, this patch tries to move in the
> > > wrong direction. If anything, we should kill this hack altogether and
> > > fix the (handful of?) drivers that rely on it. That'd actually be a
> > > good way to find whether they are still worth keeping in the tree. And
> > > if it breaks too many of them, then at least we'll know where we
> > > stand.
> > >
> > > I'd be tempted to leave the below patch simmer in -next for a few
> > > weeks and see if how many people shout:
> >
> > This looks like a more proper direction to go.
> > but here i am wondering how sysfs ABI document should follow the below change
> > doc is patch 2/2:
> > https://lore.kernel.org/lkml/20210820223744.8439-3-21cnbao@gmail.com/
> >
> > On the other hand, my feeling is that nobody should depend on sysfs
> > irq entry nowadays.
> > For example, userspace irqbalance is actually using /sys/devices/.../msi_irqs/
> > So probably we should set this ABI invisible when devices are using
> > MSI or MSI-X?
> 
> i mean something like the below,
> 
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 5d63df7..1323841 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -26,6 +26,7 @@
>  #include <linux/slab.h>
>  #include <linux/vgaarb.h>
>  #include <linux/pm_runtime.h>
> +#include <linux/msi.h>
>  #include <linux/of.h>
>  #include "pci.h"
> 
> @@ -1437,6 +1438,16 @@ static umode_t pci_dev_attrs_are_visible(struct
> kobject *kobj,
>                 if ((pdev->class >> 8) != PCI_CLASS_DISPLAY_VGA)
>                         return 0;
> 
> +#ifdef CONFIG_PCI_MSI
> +       /*
> +        * if devices are MSI and MSI-X, IRQ sysfs ABI is meaningless
> +        * and broken
> +        */
> +       if (a == &dev_attr_irq.attr)
> +               if (first_pci_msi_entry(pdev))
> +                       return 0;
> +#endif
> +
>         return a->mode;
>  }

I think this idea has been discarded anyway, but sysfs doesn't work
this way.  The .is_visible() function is evaluated once at
device_add()-time, i.e., during enumeration, so there's no way to
dynamically change the visibility as the driver enables/disables
MSI.

I *wish* sysfs had more flexibility like this, though.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X
  2021-08-23 22:46               ` Barry Song
@ 2021-08-24 19:34                 ` Bjorn Helgaas
  2021-08-25  9:45                   ` Marc Zyngier
  0 siblings, 1 reply; 19+ messages in thread
From: Bjorn Helgaas @ 2021-08-24 19:34 UTC (permalink / raw)
  To: Barry Song
  Cc: Marc Zyngier, Bjorn Helgaas, Jonathan Corbet, Jonathan.Cameron,
	bilbao, Greg Kroah-Hartman, leon, LKML, linux-pci, Linuxarm,
	luzmaximilian, mchehab+huawei, schnelle, Barry Song,
	Thomas Gleixner

On Tue, Aug 24, 2021 at 10:46:59AM +1200, Barry Song wrote:
> On Mon, Aug 23, 2021 at 11:28 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Mon, 23 Aug 2021 12:03:08 +0100,
> > Barry Song <21cnbao@gmail.com> wrote:

> +static ssize_t irq_show(struct device *dev,
> +                                        struct device_attribute *attr,
> +                                        char *buf)
> +{
> +       struct pci_dev *pdev = to_pci_dev(dev);
> +#ifdef CONFIG_PCI_MSI
> +       struct msi_desc *desc = first_pci_msi_entry(pdev);
> +
> +       /* for MSI, return the 1st IRQ in IRQ vector */
> +       if (desc && !desc->msi_attrib.is_msix)
> +               return sysfs_emit(buf, "%u\n", desc->irq);
> +#endif
> +
> +       return sysfs_emit(buf, "%u\n", pdev->irq);
> +}
> +static DEVICE_ATTR_RO(irq);

Makes sense to me.  And with Marc's patch maybe we could get rid of
default_irq, which also seems nice.

> > > if we don't want to change the behaviour of any existing ABI, it
> > > seems the only thing we can do here to document it well in ABI
> > > doc. i actually doubt anyone has really understood what the irq
> > > entry is really showing.
> >
> > Given that we can't prove that it is actually the case, I believe this
> > is the only option.
> 
> we have to document the ABI like below though it seems quite annoying.
> 
> 1. for devices which don't support MSI and MSI-X, show legacy INTx
> 2. for devices which support MSI
>     a. if CONFIG_PCI_MSI is not enabled,  show legacy INTx
>     b. if CONFIG_PCI_MSI is enabled and devices are using MSI at this
> moment, show 1st IRQ in the vector
>     c. if CONFIG_PCI_MSI is enabled, but we shutdown its MSI before
> the users call sysfs entry,
>         so at this moment, devices are not using MSI,  show legacy INTx
> 3. for devices which support MSI-X, no matter if it is using MSI-X,
>     show legacy INTx
> 4. In Addition, INTx might be broken due to incomplete firmware or
> hardware design for MSI and MSI-X cases
> 
> To be honest, it sounds like a disaster :-) but if this is what we
> have to do, I'd like to try it in v3.

It doesn't seem necessary to me to get into the gory details of
CONFIG_PCI_MSI -- if that's not enabled, drivers can't use MSI anyway.

I don't understand 3.  If a device supports both MSI and MSI-X and a
driver enables MSI, msi_capability_init() writes dev->irq, so it looks
like "irq" should contain the first MSI vector.

I don't understand 4, either.  Is the possibility of broken hardware
or firmware something we need to document?  

What about something like this?

  If a driver has enabled MSI (not MSI-X), "irq" contains the IRQ of
  the first MSI vector.  Otherwise "irq" contains the IRQ of the
  legacy INTx interrupt.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X
  2021-08-21 10:42     ` Marc Zyngier
  2021-08-21 22:14       ` Barry Song
@ 2021-08-24 20:51       ` Barry Song
  2021-08-24 21:29         ` Barry Song
  1 sibling, 1 reply; 19+ messages in thread
From: Barry Song @ 2021-08-24 20:51 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Bjorn Helgaas, Bjorn Helgaas, Jonathan Corbet, Jonathan.Cameron,
	bilbao, Greg Kroah-Hartman, leon, LKML, linux-pci, Linuxarm,
	luzmaximilian, mchehab+huawei, schnelle, Barry Song,
	Thomas Gleixner

On Sat, Aug 21, 2021 at 10:42 PM Marc Zyngier <maz@kernel.org> wrote:
>
> Hi Bjorn,
>
> On Sat, 21 Aug 2021 00:33:28 +0100,
> Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > [+cc Thomas, Marc]
> >
> > On Sat, Aug 21, 2021 at 10:37:43AM +1200, Barry Song wrote:
> > > From: Barry Song <song.bao.hua@hisilicon.com>
> > >
> > > /sys/bus/pci/devices/.../irq sysfs ABI is very confusing at this
> > > moment especially for MSI-X cases.
> >
> > AFAICT this patch *only* affects MSI-X.  So are you saying the sysfs
> > ABI is fine for MSI but confusing for MSI-X?
> >
> > > While MSI sets IRQ to the first
> > > number in the vector, MSI-X does nothing for this though it saves
> > > default_irq in msix_setup_entries(). Weird the saved default_irq
> > > for MSI-X is never used in pci_msix_shutdown(), which is quite
> > > different with pci_msi_shutdown(). Thus, this patch moves to show
> > > the first IRQ number which is from the first msi_entry for MSI-X.
> > > Hopefully, this can make IRQ ABI more clear and more consistent.
> > >
> > > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> > > ---
> > >  drivers/pci/msi.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > > index 9232255..6bbf81b 100644
> > > --- a/drivers/pci/msi.c
> > > +++ b/drivers/pci/msi.c
> > > @@ -771,6 +771,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> > >     int ret;
> > >     u16 control;
> > >     void __iomem *base;
> > > +   struct msi_desc *desc;
> > >
> > >     /* Ensure MSI-X is disabled while it is set up */
> > >     pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
> > > @@ -814,6 +815,10 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> > >     pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
> > >
> > >     pcibios_free_irq(dev);
> > > +
> > > +   desc = first_pci_msi_entry(dev);
> > > +   dev->irq = desc->irq;
> >
> > This change is not primarily about sysfs.  This is about changing
> > "dev->irq" when MSI-X is enabled, and it's only incidental that sysfs
> > reflects that.
> >
> > So we need to know the effect of changing dev->irq.  Drivers may use
> > the value of dev->irq, and I'm *guessing* this change shouldn't break
> > them since we already do this for MSI, but I'd like some more expert
> > opinion than mine :)
> >
> > For MSI we have:
> >
> >   msi_capability_init
> >     msi_setup_entry
> >       entry = alloc_msi_entry(nvec)
> >       entry->msi_attrib.default_irq = dev->irq;     /* Save IOAPIC IRQ */
> >     dev->irq = entry->irq;
> >
> >   pci_msi_shutdown
> >     /* Restore dev->irq to its default pin-assertion IRQ */
> >     dev->irq = desc->msi_attrib.default_irq;
> >
> > and for MSI-X we have:
> >
> >   msix_capability_init
> >     msix_setup_entries
> >       for (i = 0; i < nvec; i++)
> >         entry = alloc_msi_entry(1)
> >       entry->msi_attrib.default_irq = dev->irq;
> >
> >   pci_msix_shutdown
> >     for_each_pci_msi_entry(entry, dev)
> >       __pci_msix_desc_mask_irq
> > +   dev->irq = entry->msi_attrib.default_irq;   # added by this patch
> >
> >
> > Things that seem strange to me:
> >
> >   - The msi_setup_entry() comment "Save IOAPIC IRQ" seems needlessly
> >     specific; maybe it should be "INTx IRQ".
> >
> >   - The pci_msi_shutdown() comment "Restore ... pin-assertion IRQ"
> >     should match the msi_setup_entry() one, e.g., maybe it should also
> >     be "INTx IRQ".  There are no INTx or IOAPIC pins in PCIe.
> >
> >   - The only use of .default_irq is to save and restore dev->irq, so
> >     it looks like a per-device thing, not a per-vector thing.
> >
> >     In msi_setup_entry() there's only one msi_entry, so there's only
> >     one saved .default_irq.
> >
> >     In msix_setup_entries(), we get nvecs msi_entry structs, and we
> >     get a saved .default_irq in each one?
>
> That's a key point.
>
> Old-school PCI/MSI is represented by a single interrupt, and you
> *could* somehow make it relatively easy for drivers that only
> understand INTx to migrate to MSI if you replaced whatever is held in
> dev->irq (which should only represent the INTx mapping) with the MSI
> interrupt number. Which I guess is what the MSI code is doing.
>
> This is the 21st century, and nobody should ever rely on such horror,
> but I'm sure we do have such drivers in the tree. Boo.
>
> However, this *cannot* hold true for Multi-MSI, nor MSI-X, because
> there is a plurality of interrupts. Even worse, for MSI-X, there is
> zero guarantee that the allocated interrupts will be in a contiguous
> space.
>
> Given that, what is dev->irq good for? "Absolutely Nothing! (say it
> again!)".
>
> MSI-X is not something you can "accidentally" use. You have to
> actively embrace it. In all honesty, this patch tries to move in the
> wrong direction. If anything, we should kill this hack altogether and
> fix the (handful of?) drivers that rely on it. That'd actually be a
> good way to find whether they are still worth keeping in the tree. And
> if it breaks too many of them, then at least we'll know where we
> stand.
>
> I'd be tempted to leave the below patch simmer in -next for a few
> weeks and see if how many people shout:
>
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index e5e75331b415..2be9a01cbe72 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -591,7 +591,6 @@ msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd)
>         entry->msi_attrib.is_virtual    = 0;
>         entry->msi_attrib.entry_nr      = 0;
>         entry->msi_attrib.maskbit       = !!(control & PCI_MSI_FLAGS_MASKBIT);
> -       entry->msi_attrib.default_irq   = dev->irq;     /* Save IOAPIC IRQ */
>         entry->msi_attrib.multi_cap     = (control & PCI_MSI_FLAGS_QMASK) >> 1;
>         entry->msi_attrib.multiple      = ilog2(__roundup_pow_of_two(nvec));
>
> @@ -682,7 +681,6 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
>         dev->msi_enabled = 1;
>
>         pcibios_free_irq(dev);
> -       dev->irq = entry->irq;
>         return 0;
>  }
>
> @@ -742,7 +740,6 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base,
>                 entry->msi_attrib.is_virtual =
>                         entry->msi_attrib.entry_nr >= vec_count;
>
> -               entry->msi_attrib.default_irq   = dev->irq;
>                 entry->mask_base                = base;
>
>                 addr = pci_msix_desc_addr(entry);
> @@ -964,8 +961,6 @@ static void pci_msi_shutdown(struct pci_dev *dev)
>         mask = msi_mask(desc->msi_attrib.multi_cap);
>         msi_mask_irq(desc, mask, 0);
>
> -       /* Restore dev->irq to its default pin-assertion IRQ */
> -       dev->irq = desc->msi_attrib.default_irq;
>         pcibios_alloc_irq(dev);
>  }
>
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index e8bdcb83172b..a631664c1c38 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -114,7 +114,6 @@ struct ti_sci_inta_msi_desc {
>   * @maskbit:   [PCI MSI/X] Mask-Pending bit supported?
>   * @is_64:     [PCI MSI/X] Address size: 0=32bit 1=64bit
>   * @entry_nr:  [PCI MSI/X] Entry which is described by this descriptor
> - * @default_irq:[PCI MSI/X] The default pre-assigned non-MSI irq
>   * @mask_pos:  [PCI MSI]   Mask register position
>   * @mask_base: [PCI MSI-X] Mask register base address
>   * @platform:  [platform]  Platform device specific msi descriptor data
> @@ -148,7 +147,6 @@ struct msi_desc {
>                                 u8      is_64           : 1;
>                                 u8      is_virtual      : 1;
>                                 u16     entry_nr;
> -                               unsigned default_irq;
>                         } msi_attrib;
>                         union {
>                                 u8      mask_pos;
>

We will also need the below change as  pci_irq_vector() depends on
dev->irq for the MSI case.

int pci_irq_vector(struct pci_dev *dev, unsigned int nr)
{
        if (dev->msix_enabled) {
                struct msi_desc *entry;
                int i = 0;

                for_each_pci_msi_entry(entry, dev) {
                        if (i == nr)
                                return entry->irq;
                        i++;
                }
                WARN_ON_ONCE(1);
                return -EINVAL;
        }

        if (dev->msi_enabled) {
                struct msi_desc *entry = first_pci_msi_entry(dev);

                if (WARN_ON_ONCE(nr >= entry->nvec_used))
                        return -EINVAL;

+                return entry->irq + nr;
        } else {
                if (WARN_ON_ONCE(nr > 0))
                        return -EINVAL;
        }


-        return dev->irq + nr;
+       return dev->irq;
}
EXPORT_SYMBOL(pci_irq_vector);

> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

Thanks
barry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X
  2021-08-24 20:51       ` Barry Song
@ 2021-08-24 21:29         ` Barry Song
  2021-08-25 10:24           ` Marc Zyngier
  0 siblings, 1 reply; 19+ messages in thread
From: Barry Song @ 2021-08-24 21:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Bjorn Helgaas, Bjorn Helgaas, Jonathan Corbet, Jonathan.Cameron,
	bilbao, Greg Kroah-Hartman, leon, LKML, linux-pci, Linuxarm,
	luzmaximilian, mchehab+huawei, schnelle, Barry Song,
	Thomas Gleixner

On Wed, Aug 25, 2021 at 8:51 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Sat, Aug 21, 2021 at 10:42 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > Hi Bjorn,
> >
> > On Sat, 21 Aug 2021 00:33:28 +0100,
> > Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >
> > > [+cc Thomas, Marc]
> > >
> > > On Sat, Aug 21, 2021 at 10:37:43AM +1200, Barry Song wrote:
> > > > From: Barry Song <song.bao.hua@hisilicon.com>
> > > >
> > > > /sys/bus/pci/devices/.../irq sysfs ABI is very confusing at this
> > > > moment especially for MSI-X cases.
> > >
> > > AFAICT this patch *only* affects MSI-X.  So are you saying the sysfs
> > > ABI is fine for MSI but confusing for MSI-X?
> > >
> > > > While MSI sets IRQ to the first
> > > > number in the vector, MSI-X does nothing for this though it saves
> > > > default_irq in msix_setup_entries(). Weird the saved default_irq
> > > > for MSI-X is never used in pci_msix_shutdown(), which is quite
> > > > different with pci_msi_shutdown(). Thus, this patch moves to show
> > > > the first IRQ number which is from the first msi_entry for MSI-X.
> > > > Hopefully, this can make IRQ ABI more clear and more consistent.
> > > >
> > > > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> > > > ---
> > > >  drivers/pci/msi.c | 6 ++++++
> > > >  1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > > > index 9232255..6bbf81b 100644
> > > > --- a/drivers/pci/msi.c
> > > > +++ b/drivers/pci/msi.c
> > > > @@ -771,6 +771,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> > > >     int ret;
> > > >     u16 control;
> > > >     void __iomem *base;
> > > > +   struct msi_desc *desc;
> > > >
> > > >     /* Ensure MSI-X is disabled while it is set up */
> > > >     pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
> > > > @@ -814,6 +815,10 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> > > >     pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
> > > >
> > > >     pcibios_free_irq(dev);
> > > > +
> > > > +   desc = first_pci_msi_entry(dev);
> > > > +   dev->irq = desc->irq;
> > >
> > > This change is not primarily about sysfs.  This is about changing
> > > "dev->irq" when MSI-X is enabled, and it's only incidental that sysfs
> > > reflects that.
> > >
> > > So we need to know the effect of changing dev->irq.  Drivers may use
> > > the value of dev->irq, and I'm *guessing* this change shouldn't break
> > > them since we already do this for MSI, but I'd like some more expert
> > > opinion than mine :)
> > >
> > > For MSI we have:
> > >
> > >   msi_capability_init
> > >     msi_setup_entry
> > >       entry = alloc_msi_entry(nvec)
> > >       entry->msi_attrib.default_irq = dev->irq;     /* Save IOAPIC IRQ */
> > >     dev->irq = entry->irq;
> > >
> > >   pci_msi_shutdown
> > >     /* Restore dev->irq to its default pin-assertion IRQ */
> > >     dev->irq = desc->msi_attrib.default_irq;
> > >
> > > and for MSI-X we have:
> > >
> > >   msix_capability_init
> > >     msix_setup_entries
> > >       for (i = 0; i < nvec; i++)
> > >         entry = alloc_msi_entry(1)
> > >       entry->msi_attrib.default_irq = dev->irq;
> > >
> > >   pci_msix_shutdown
> > >     for_each_pci_msi_entry(entry, dev)
> > >       __pci_msix_desc_mask_irq
> > > +   dev->irq = entry->msi_attrib.default_irq;   # added by this patch
> > >
> > >
> > > Things that seem strange to me:
> > >
> > >   - The msi_setup_entry() comment "Save IOAPIC IRQ" seems needlessly
> > >     specific; maybe it should be "INTx IRQ".
> > >
> > >   - The pci_msi_shutdown() comment "Restore ... pin-assertion IRQ"
> > >     should match the msi_setup_entry() one, e.g., maybe it should also
> > >     be "INTx IRQ".  There are no INTx or IOAPIC pins in PCIe.
> > >
> > >   - The only use of .default_irq is to save and restore dev->irq, so
> > >     it looks like a per-device thing, not a per-vector thing.
> > >
> > >     In msi_setup_entry() there's only one msi_entry, so there's only
> > >     one saved .default_irq.
> > >
> > >     In msix_setup_entries(), we get nvecs msi_entry structs, and we
> > >     get a saved .default_irq in each one?
> >
> > That's a key point.
> >
> > Old-school PCI/MSI is represented by a single interrupt, and you
> > *could* somehow make it relatively easy for drivers that only
> > understand INTx to migrate to MSI if you replaced whatever is held in
> > dev->irq (which should only represent the INTx mapping) with the MSI
> > interrupt number. Which I guess is what the MSI code is doing.
> >
> > This is the 21st century, and nobody should ever rely on such horror,
> > but I'm sure we do have such drivers in the tree. Boo.
> >
> > However, this *cannot* hold true for Multi-MSI, nor MSI-X, because
> > there is a plurality of interrupts. Even worse, for MSI-X, there is
> > zero guarantee that the allocated interrupts will be in a contiguous
> > space.
> >
> > Given that, what is dev->irq good for? "Absolutely Nothing! (say it
> > again!)".
> >
> > MSI-X is not something you can "accidentally" use. You have to
> > actively embrace it. In all honesty, this patch tries to move in the
> > wrong direction. If anything, we should kill this hack altogether and
> > fix the (handful of?) drivers that rely on it. That'd actually be a
> > good way to find whether they are still worth keeping in the tree. And
> > if it breaks too many of them, then at least we'll know where we
> > stand.
> >
> > I'd be tempted to leave the below patch simmer in -next for a few
> > weeks and see if how many people shout:
> >
> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > index e5e75331b415..2be9a01cbe72 100644
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -591,7 +591,6 @@ msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd)
> >         entry->msi_attrib.is_virtual    = 0;
> >         entry->msi_attrib.entry_nr      = 0;
> >         entry->msi_attrib.maskbit       = !!(control & PCI_MSI_FLAGS_MASKBIT);
> > -       entry->msi_attrib.default_irq   = dev->irq;     /* Save IOAPIC IRQ */
> >         entry->msi_attrib.multi_cap     = (control & PCI_MSI_FLAGS_QMASK) >> 1;
> >         entry->msi_attrib.multiple      = ilog2(__roundup_pow_of_two(nvec));
> >
> > @@ -682,7 +681,6 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
> >         dev->msi_enabled = 1;
> >
> >         pcibios_free_irq(dev);
> > -       dev->irq = entry->irq;
> >         return 0;
> >  }
> >
> > @@ -742,7 +740,6 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base,
> >                 entry->msi_attrib.is_virtual =
> >                         entry->msi_attrib.entry_nr >= vec_count;
> >
> > -               entry->msi_attrib.default_irq   = dev->irq;
> >                 entry->mask_base                = base;
> >
> >                 addr = pci_msix_desc_addr(entry);
> > @@ -964,8 +961,6 @@ static void pci_msi_shutdown(struct pci_dev *dev)
> >         mask = msi_mask(desc->msi_attrib.multi_cap);
> >         msi_mask_irq(desc, mask, 0);
> >
> > -       /* Restore dev->irq to its default pin-assertion IRQ */
> > -       dev->irq = desc->msi_attrib.default_irq;
> >         pcibios_alloc_irq(dev);
> >  }
> >
> > diff --git a/include/linux/msi.h b/include/linux/msi.h
> > index e8bdcb83172b..a631664c1c38 100644
> > --- a/include/linux/msi.h
> > +++ b/include/linux/msi.h
> > @@ -114,7 +114,6 @@ struct ti_sci_inta_msi_desc {
> >   * @maskbit:   [PCI MSI/X] Mask-Pending bit supported?
> >   * @is_64:     [PCI MSI/X] Address size: 0=32bit 1=64bit
> >   * @entry_nr:  [PCI MSI/X] Entry which is described by this descriptor
> > - * @default_irq:[PCI MSI/X] The default pre-assigned non-MSI irq
> >   * @mask_pos:  [PCI MSI]   Mask register position
> >   * @mask_base: [PCI MSI-X] Mask register base address
> >   * @platform:  [platform]  Platform device specific msi descriptor data
> > @@ -148,7 +147,6 @@ struct msi_desc {
> >                                 u8      is_64           : 1;
> >                                 u8      is_virtual      : 1;
> >                                 u16     entry_nr;
> > -                               unsigned default_irq;
> >                         } msi_attrib;
> >                         union {
> >                                 u8      mask_pos;
> >
>
> We will also need the below change as  pci_irq_vector() depends on
> dev->irq for the MSI case.
>
> int pci_irq_vector(struct pci_dev *dev, unsigned int nr)
> {
>         if (dev->msix_enabled) {
>                 struct msi_desc *entry;
>                 int i = 0;
>
>                 for_each_pci_msi_entry(entry, dev) {
>                         if (i == nr)
>                                 return entry->irq;
>                         i++;
>                 }
>                 WARN_ON_ONCE(1);
>                 return -EINVAL;
>         }
>
>         if (dev->msi_enabled) {
>                 struct msi_desc *entry = first_pci_msi_entry(dev);
>
>                 if (WARN_ON_ONCE(nr >= entry->nvec_used))
>                         return -EINVAL;
>
> +                return entry->irq + nr;
>         } else {
>                 if (WARN_ON_ONCE(nr > 0))
>                         return -EINVAL;
>         }
>
>
> -        return dev->irq + nr;
> +       return dev->irq;
> }
> EXPORT_SYMBOL(pci_irq_vector);
>

And here:

--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -401,7 +401,7 @@ static void __pci_restore_msi_state(struct pci_dev *dev)
        if (!dev->msi_enabled)
                return;

-       entry = irq_get_msi_desc(dev->irq);
+       entry = first_pci_msi_entry(dev);

        pci_intx_for_msi(dev, 0);
        pci_msi_set_enable(dev, 0);

since drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c is the only one
calling pci_restore_msi_state(),
will probably require one test from the driver maintainers.

> > Thanks,
> >
> >         M.
> >
> > --
> > Without deviation from the norm, progress is not possible.
>
Thanks
barry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X
  2021-08-25 10:24           ` Marc Zyngier
@ 2021-08-24 22:51             ` Barry Song
  0 siblings, 0 replies; 19+ messages in thread
From: Barry Song @ 2021-08-24 22:51 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Bjorn Helgaas, Bjorn Helgaas, Jonathan Corbet, Jonathan.Cameron,
	bilbao, Greg Kroah-Hartman, leon, LKML, linux-pci, Linuxarm,
	luzmaximilian, mchehab+huawei, schnelle, Barry Song,
	Thomas Gleixner

On Wed, Aug 25, 2021 at 10:24 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On Tue, 24 Aug 2021 22:29:59 +0100,
> Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Wed, Aug 25, 2021 at 8:51 AM Barry Song <21cnbao@gmail.com> wrote:
> > >
> > > On Sat, Aug 21, 2021 at 10:42 PM Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > Hi Bjorn,
> > > >
> > > > On Sat, 21 Aug 2021 00:33:28 +0100,
> > > > Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > >
> > > > > [+cc Thomas, Marc]
> > > > >
> > > > > On Sat, Aug 21, 2021 at 10:37:43AM +1200, Barry Song wrote:
> > > > > > From: Barry Song <song.bao.hua@hisilicon.com>
> > > > > >
> > > > > > /sys/bus/pci/devices/.../irq sysfs ABI is very confusing at this
> > > > > > moment especially for MSI-X cases.
> > > > >
> > > > > AFAICT this patch *only* affects MSI-X.  So are you saying the sysfs
> > > > > ABI is fine for MSI but confusing for MSI-X?
> > > > >
> > > > > > While MSI sets IRQ to the first
> > > > > > number in the vector, MSI-X does nothing for this though it saves
> > > > > > default_irq in msix_setup_entries(). Weird the saved default_irq
> > > > > > for MSI-X is never used in pci_msix_shutdown(), which is quite
> > > > > > different with pci_msi_shutdown(). Thus, this patch moves to show
> > > > > > the first IRQ number which is from the first msi_entry for MSI-X.
> > > > > > Hopefully, this can make IRQ ABI more clear and more consistent.
> > > > > >
> > > > > > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > > > Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> > > > > > ---
> > > > > >  drivers/pci/msi.c | 6 ++++++
> > > > > >  1 file changed, 6 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > > > > > index 9232255..6bbf81b 100644
> > > > > > --- a/drivers/pci/msi.c
> > > > > > +++ b/drivers/pci/msi.c
> > > > > > @@ -771,6 +771,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> > > > > >     int ret;
> > > > > >     u16 control;
> > > > > >     void __iomem *base;
> > > > > > +   struct msi_desc *desc;
> > > > > >
> > > > > >     /* Ensure MSI-X is disabled while it is set up */
> > > > > >     pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
> > > > > > @@ -814,6 +815,10 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> > > > > >     pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
> > > > > >
> > > > > >     pcibios_free_irq(dev);
> > > > > > +
> > > > > > +   desc = first_pci_msi_entry(dev);
> > > > > > +   dev->irq = desc->irq;
> > > > >
> > > > > This change is not primarily about sysfs.  This is about changing
> > > > > "dev->irq" when MSI-X is enabled, and it's only incidental that sysfs
> > > > > reflects that.
> > > > >
> > > > > So we need to know the effect of changing dev->irq.  Drivers may use
> > > > > the value of dev->irq, and I'm *guessing* this change shouldn't break
> > > > > them since we already do this for MSI, but I'd like some more expert
> > > > > opinion than mine :)
> > > > >
> > > > > For MSI we have:
> > > > >
> > > > >   msi_capability_init
> > > > >     msi_setup_entry
> > > > >       entry = alloc_msi_entry(nvec)
> > > > >       entry->msi_attrib.default_irq = dev->irq;     /* Save IOAPIC IRQ */
> > > > >     dev->irq = entry->irq;
> > > > >
> > > > >   pci_msi_shutdown
> > > > >     /* Restore dev->irq to its default pin-assertion IRQ */
> > > > >     dev->irq = desc->msi_attrib.default_irq;
> > > > >
> > > > > and for MSI-X we have:
> > > > >
> > > > >   msix_capability_init
> > > > >     msix_setup_entries
> > > > >       for (i = 0; i < nvec; i++)
> > > > >         entry = alloc_msi_entry(1)
> > > > >       entry->msi_attrib.default_irq = dev->irq;
> > > > >
> > > > >   pci_msix_shutdown
> > > > >     for_each_pci_msi_entry(entry, dev)
> > > > >       __pci_msix_desc_mask_irq
> > > > > +   dev->irq = entry->msi_attrib.default_irq;   # added by this patch
> > > > >
> > > > >
> > > > > Things that seem strange to me:
> > > > >
> > > > >   - The msi_setup_entry() comment "Save IOAPIC IRQ" seems needlessly
> > > > >     specific; maybe it should be "INTx IRQ".
> > > > >
> > > > >   - The pci_msi_shutdown() comment "Restore ... pin-assertion IRQ"
> > > > >     should match the msi_setup_entry() one, e.g., maybe it should also
> > > > >     be "INTx IRQ".  There are no INTx or IOAPIC pins in PCIe.
> > > > >
> > > > >   - The only use of .default_irq is to save and restore dev->irq, so
> > > > >     it looks like a per-device thing, not a per-vector thing.
> > > > >
> > > > >     In msi_setup_entry() there's only one msi_entry, so there's only
> > > > >     one saved .default_irq.
> > > > >
> > > > >     In msix_setup_entries(), we get nvecs msi_entry structs, and we
> > > > >     get a saved .default_irq in each one?
> > > >
> > > > That's a key point.
> > > >
> > > > Old-school PCI/MSI is represented by a single interrupt, and you
> > > > *could* somehow make it relatively easy for drivers that only
> > > > understand INTx to migrate to MSI if you replaced whatever is held in
> > > > dev->irq (which should only represent the INTx mapping) with the MSI
> > > > interrupt number. Which I guess is what the MSI code is doing.
> > > >
> > > > This is the 21st century, and nobody should ever rely on such horror,
> > > > but I'm sure we do have such drivers in the tree. Boo.
> > > >
> > > > However, this *cannot* hold true for Multi-MSI, nor MSI-X, because
> > > > there is a plurality of interrupts. Even worse, for MSI-X, there is
> > > > zero guarantee that the allocated interrupts will be in a contiguous
> > > > space.
> > > >
> > > > Given that, what is dev->irq good for? "Absolutely Nothing! (say it
> > > > again!)".
> > > >
> > > > MSI-X is not something you can "accidentally" use. You have to
> > > > actively embrace it. In all honesty, this patch tries to move in the
> > > > wrong direction. If anything, we should kill this hack altogether and
> > > > fix the (handful of?) drivers that rely on it. That'd actually be a
> > > > good way to find whether they are still worth keeping in the tree. And
> > > > if it breaks too many of them, then at least we'll know where we
> > > > stand.
> > > >
> > > > I'd be tempted to leave the below patch simmer in -next for a few
> > > > weeks and see if how many people shout:
> > > >
> > > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > > > index e5e75331b415..2be9a01cbe72 100644
> > > > --- a/drivers/pci/msi.c
> > > > +++ b/drivers/pci/msi.c
> > > > @@ -591,7 +591,6 @@ msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd)
> > > >         entry->msi_attrib.is_virtual    = 0;
> > > >         entry->msi_attrib.entry_nr      = 0;
> > > >         entry->msi_attrib.maskbit       = !!(control & PCI_MSI_FLAGS_MASKBIT);
> > > > -       entry->msi_attrib.default_irq   = dev->irq;     /* Save IOAPIC IRQ */
> > > >         entry->msi_attrib.multi_cap     = (control & PCI_MSI_FLAGS_QMASK) >> 1;
> > > >         entry->msi_attrib.multiple      = ilog2(__roundup_pow_of_two(nvec));
> > > >
> > > > @@ -682,7 +681,6 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
> > > >         dev->msi_enabled = 1;
> > > >
> > > >         pcibios_free_irq(dev);
> > > > -       dev->irq = entry->irq;
> > > >         return 0;
> > > >  }
> > > >
> > > > @@ -742,7 +740,6 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base,
> > > >                 entry->msi_attrib.is_virtual =
> > > >                         entry->msi_attrib.entry_nr >= vec_count;
> > > >
> > > > -               entry->msi_attrib.default_irq   = dev->irq;
> > > >                 entry->mask_base                = base;
> > > >
> > > >                 addr = pci_msix_desc_addr(entry);
> > > > @@ -964,8 +961,6 @@ static void pci_msi_shutdown(struct pci_dev *dev)
> > > >         mask = msi_mask(desc->msi_attrib.multi_cap);
> > > >         msi_mask_irq(desc, mask, 0);
> > > >
> > > > -       /* Restore dev->irq to its default pin-assertion IRQ */
> > > > -       dev->irq = desc->msi_attrib.default_irq;
> > > >         pcibios_alloc_irq(dev);
> > > >  }
> > > >
> > > > diff --git a/include/linux/msi.h b/include/linux/msi.h
> > > > index e8bdcb83172b..a631664c1c38 100644
> > > > --- a/include/linux/msi.h
> > > > +++ b/include/linux/msi.h
> > > > @@ -114,7 +114,6 @@ struct ti_sci_inta_msi_desc {
> > > >   * @maskbit:   [PCI MSI/X] Mask-Pending bit supported?
> > > >   * @is_64:     [PCI MSI/X] Address size: 0=32bit 1=64bit
> > > >   * @entry_nr:  [PCI MSI/X] Entry which is described by this descriptor
> > > > - * @default_irq:[PCI MSI/X] The default pre-assigned non-MSI irq
> > > >   * @mask_pos:  [PCI MSI]   Mask register position
> > > >   * @mask_base: [PCI MSI-X] Mask register base address
> > > >   * @platform:  [platform]  Platform device specific msi descriptor data
> > > > @@ -148,7 +147,6 @@ struct msi_desc {
> > > >                                 u8      is_64           : 1;
> > > >                                 u8      is_virtual      : 1;
> > > >                                 u16     entry_nr;
> > > > -                               unsigned default_irq;
> > > >                         } msi_attrib;
> > > >                         union {
> > > >                                 u8      mask_pos;
> > > >
> > >
> > > We will also need the below change as  pci_irq_vector() depends on
> > > dev->irq for the MSI case.
> > >
> > > int pci_irq_vector(struct pci_dev *dev, unsigned int nr)
> > > {
> > >         if (dev->msix_enabled) {
> > >                 struct msi_desc *entry;
> > >                 int i = 0;
> > >
> > >                 for_each_pci_msi_entry(entry, dev) {
> > >                         if (i == nr)
> > >                                 return entry->irq;
> > >                         i++;
> > >                 }
> > >                 WARN_ON_ONCE(1);
> > >                 return -EINVAL;
> > >         }
> > >
> > >         if (dev->msi_enabled) {
> > >                 struct msi_desc *entry = first_pci_msi_entry(dev);
> > >
> > >                 if (WARN_ON_ONCE(nr >= entry->nvec_used))
> > >                         return -EINVAL;
> > >
> > > +                return entry->irq + nr;
> > >         } else {
> > >                 if (WARN_ON_ONCE(nr > 0))
> > >                         return -EINVAL;
> > >         }
> > >
> > >
> > > -        return dev->irq + nr;
> > > +       return dev->irq;
> > > }
> > > EXPORT_SYMBOL(pci_irq_vector);
> > >
> >
> > And here:
> >
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -401,7 +401,7 @@ static void __pci_restore_msi_state(struct pci_dev *dev)
> >         if (!dev->msi_enabled)
> >                 return;
> >
> > -       entry = irq_get_msi_desc(dev->irq);
> > +       entry = first_pci_msi_entry(dev);
> >
> >         pci_intx_for_msi(dev, 0);
> >         pci_msi_set_enable(dev, 0);
> >
> > since drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c is the only
> > one calling pci_restore_msi_state(), will probably require one test
> > from the driver maintainers.
>
> All good catches. This needs tons of testing, and I fully expect
> regressions (at least drivers falling back to INTx when they were
> planning to use MSIs).
>
> I'll try to roll a proper patch once 5.14 is out.

Thanks! Anyway, I've integrated all good suggestions from the
discussion into v3:
https://lore.kernel.org/lkml/20210825102636.52757-1-21cnbao@gmail.com/
So that we can have a new base to start after 5.14 is out.

>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

Thanks
barry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X
  2021-08-24 19:34                 ` Bjorn Helgaas
@ 2021-08-25  9:45                   ` Marc Zyngier
  0 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2021-08-25  9:45 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Barry Song, Bjorn Helgaas, Jonathan Corbet, Jonathan.Cameron,
	bilbao, Greg Kroah-Hartman, leon, LKML, linux-pci, Linuxarm,
	luzmaximilian, mchehab+huawei, schnelle, Barry Song,
	Thomas Gleixner

On Tue, 24 Aug 2021 20:34:38 +0100,
Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> On Tue, Aug 24, 2021 at 10:46:59AM +1200, Barry Song wrote:
> > On Mon, Aug 23, 2021 at 11:28 PM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Mon, 23 Aug 2021 12:03:08 +0100,
> > > Barry Song <21cnbao@gmail.com> wrote:
> 
> > +static ssize_t irq_show(struct device *dev,
> > +                                        struct device_attribute *attr,
> > +                                        char *buf)
> > +{
> > +       struct pci_dev *pdev = to_pci_dev(dev);
> > +#ifdef CONFIG_PCI_MSI
> > +       struct msi_desc *desc = first_pci_msi_entry(pdev);
> > +
> > +       /* for MSI, return the 1st IRQ in IRQ vector */
> > +       if (desc && !desc->msi_attrib.is_msix)
> > +               return sysfs_emit(buf, "%u\n", desc->irq);
> > +#endif
> > +
> > +       return sysfs_emit(buf, "%u\n", pdev->irq);
> > +}
> > +static DEVICE_ATTR_RO(irq);
> 
> Makes sense to me.  And with Marc's patch maybe we could get rid of
> default_irq, which also seems nice.
> 
> > > > if we don't want to change the behaviour of any existing ABI, it
> > > > seems the only thing we can do here to document it well in ABI
> > > > doc. i actually doubt anyone has really understood what the irq
> > > > entry is really showing.
> > >
> > > Given that we can't prove that it is actually the case, I believe this
> > > is the only option.
> > 
> > we have to document the ABI like below though it seems quite annoying.
> > 
> > 1. for devices which don't support MSI and MSI-X, show legacy INTx
> > 2. for devices which support MSI
> >     a. if CONFIG_PCI_MSI is not enabled,  show legacy INTx
> >     b. if CONFIG_PCI_MSI is enabled and devices are using MSI at this
> > moment, show 1st IRQ in the vector
> >     c. if CONFIG_PCI_MSI is enabled, but we shutdown its MSI before
> > the users call sysfs entry,
> >         so at this moment, devices are not using MSI,  show legacy INTx
> > 3. for devices which support MSI-X, no matter if it is using MSI-X,
> >     show legacy INTx
> > 4. In Addition, INTx might be broken due to incomplete firmware or
> > hardware design for MSI and MSI-X cases
> > 
> > To be honest, it sounds like a disaster :-) but if this is what we
> > have to do, I'd like to try it in v3.
> 
> It doesn't seem necessary to me to get into the gory details of
> CONFIG_PCI_MSI -- if that's not enabled, drivers can't use MSI anyway.
> 
> I don't understand 3.  If a device supports both MSI and MSI-X and a
> driver enables MSI, msi_capability_init() writes dev->irq, so it looks
> like "irq" should contain the first MSI vector.
> 
> I don't understand 4, either.  Is the possibility of broken hardware
> or firmware something we need to document?  
> 
> What about something like this?
> 
>   If a driver has enabled MSI (not MSI-X), "irq" contains the IRQ of
>   the first MSI vector.  Otherwise "irq" contains the IRQ of the
>   legacy INTx interrupt.
> 

I think that pretty much nails it. CONFIG_MSI is not something that
userspace can (nor should) discover anyway.

For (4), you may want to add that

  "irq" being set to 0 indicates that the device isn't capable of
  generating legacy INTx interrupts.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X
  2021-08-24 21:29         ` Barry Song
@ 2021-08-25 10:24           ` Marc Zyngier
  2021-08-24 22:51             ` Barry Song
  0 siblings, 1 reply; 19+ messages in thread
From: Marc Zyngier @ 2021-08-25 10:24 UTC (permalink / raw)
  To: Barry Song
  Cc: Bjorn Helgaas, Bjorn Helgaas, Jonathan Corbet, Jonathan.Cameron,
	bilbao, Greg Kroah-Hartman, leon, LKML, linux-pci, Linuxarm,
	luzmaximilian, mchehab+huawei, schnelle, Barry Song,
	Thomas Gleixner

On Tue, 24 Aug 2021 22:29:59 +0100,
Barry Song <21cnbao@gmail.com> wrote:
> 
> On Wed, Aug 25, 2021 at 8:51 AM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Sat, Aug 21, 2021 at 10:42 PM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > Hi Bjorn,
> > >
> > > On Sat, 21 Aug 2021 00:33:28 +0100,
> > > Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > >
> > > > [+cc Thomas, Marc]
> > > >
> > > > On Sat, Aug 21, 2021 at 10:37:43AM +1200, Barry Song wrote:
> > > > > From: Barry Song <song.bao.hua@hisilicon.com>
> > > > >
> > > > > /sys/bus/pci/devices/.../irq sysfs ABI is very confusing at this
> > > > > moment especially for MSI-X cases.
> > > >
> > > > AFAICT this patch *only* affects MSI-X.  So are you saying the sysfs
> > > > ABI is fine for MSI but confusing for MSI-X?
> > > >
> > > > > While MSI sets IRQ to the first
> > > > > number in the vector, MSI-X does nothing for this though it saves
> > > > > default_irq in msix_setup_entries(). Weird the saved default_irq
> > > > > for MSI-X is never used in pci_msix_shutdown(), which is quite
> > > > > different with pci_msi_shutdown(). Thus, this patch moves to show
> > > > > the first IRQ number which is from the first msi_entry for MSI-X.
> > > > > Hopefully, this can make IRQ ABI more clear and more consistent.
> > > > >
> > > > > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > > Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> > > > > ---
> > > > >  drivers/pci/msi.c | 6 ++++++
> > > > >  1 file changed, 6 insertions(+)
> > > > >
> > > > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > > > > index 9232255..6bbf81b 100644
> > > > > --- a/drivers/pci/msi.c
> > > > > +++ b/drivers/pci/msi.c
> > > > > @@ -771,6 +771,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> > > > >     int ret;
> > > > >     u16 control;
> > > > >     void __iomem *base;
> > > > > +   struct msi_desc *desc;
> > > > >
> > > > >     /* Ensure MSI-X is disabled while it is set up */
> > > > >     pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
> > > > > @@ -814,6 +815,10 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> > > > >     pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
> > > > >
> > > > >     pcibios_free_irq(dev);
> > > > > +
> > > > > +   desc = first_pci_msi_entry(dev);
> > > > > +   dev->irq = desc->irq;
> > > >
> > > > This change is not primarily about sysfs.  This is about changing
> > > > "dev->irq" when MSI-X is enabled, and it's only incidental that sysfs
> > > > reflects that.
> > > >
> > > > So we need to know the effect of changing dev->irq.  Drivers may use
> > > > the value of dev->irq, and I'm *guessing* this change shouldn't break
> > > > them since we already do this for MSI, but I'd like some more expert
> > > > opinion than mine :)
> > > >
> > > > For MSI we have:
> > > >
> > > >   msi_capability_init
> > > >     msi_setup_entry
> > > >       entry = alloc_msi_entry(nvec)
> > > >       entry->msi_attrib.default_irq = dev->irq;     /* Save IOAPIC IRQ */
> > > >     dev->irq = entry->irq;
> > > >
> > > >   pci_msi_shutdown
> > > >     /* Restore dev->irq to its default pin-assertion IRQ */
> > > >     dev->irq = desc->msi_attrib.default_irq;
> > > >
> > > > and for MSI-X we have:
> > > >
> > > >   msix_capability_init
> > > >     msix_setup_entries
> > > >       for (i = 0; i < nvec; i++)
> > > >         entry = alloc_msi_entry(1)
> > > >       entry->msi_attrib.default_irq = dev->irq;
> > > >
> > > >   pci_msix_shutdown
> > > >     for_each_pci_msi_entry(entry, dev)
> > > >       __pci_msix_desc_mask_irq
> > > > +   dev->irq = entry->msi_attrib.default_irq;   # added by this patch
> > > >
> > > >
> > > > Things that seem strange to me:
> > > >
> > > >   - The msi_setup_entry() comment "Save IOAPIC IRQ" seems needlessly
> > > >     specific; maybe it should be "INTx IRQ".
> > > >
> > > >   - The pci_msi_shutdown() comment "Restore ... pin-assertion IRQ"
> > > >     should match the msi_setup_entry() one, e.g., maybe it should also
> > > >     be "INTx IRQ".  There are no INTx or IOAPIC pins in PCIe.
> > > >
> > > >   - The only use of .default_irq is to save and restore dev->irq, so
> > > >     it looks like a per-device thing, not a per-vector thing.
> > > >
> > > >     In msi_setup_entry() there's only one msi_entry, so there's only
> > > >     one saved .default_irq.
> > > >
> > > >     In msix_setup_entries(), we get nvecs msi_entry structs, and we
> > > >     get a saved .default_irq in each one?
> > >
> > > That's a key point.
> > >
> > > Old-school PCI/MSI is represented by a single interrupt, and you
> > > *could* somehow make it relatively easy for drivers that only
> > > understand INTx to migrate to MSI if you replaced whatever is held in
> > > dev->irq (which should only represent the INTx mapping) with the MSI
> > > interrupt number. Which I guess is what the MSI code is doing.
> > >
> > > This is the 21st century, and nobody should ever rely on such horror,
> > > but I'm sure we do have such drivers in the tree. Boo.
> > >
> > > However, this *cannot* hold true for Multi-MSI, nor MSI-X, because
> > > there is a plurality of interrupts. Even worse, for MSI-X, there is
> > > zero guarantee that the allocated interrupts will be in a contiguous
> > > space.
> > >
> > > Given that, what is dev->irq good for? "Absolutely Nothing! (say it
> > > again!)".
> > >
> > > MSI-X is not something you can "accidentally" use. You have to
> > > actively embrace it. In all honesty, this patch tries to move in the
> > > wrong direction. If anything, we should kill this hack altogether and
> > > fix the (handful of?) drivers that rely on it. That'd actually be a
> > > good way to find whether they are still worth keeping in the tree. And
> > > if it breaks too many of them, then at least we'll know where we
> > > stand.
> > >
> > > I'd be tempted to leave the below patch simmer in -next for a few
> > > weeks and see if how many people shout:
> > >
> > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > > index e5e75331b415..2be9a01cbe72 100644
> > > --- a/drivers/pci/msi.c
> > > +++ b/drivers/pci/msi.c
> > > @@ -591,7 +591,6 @@ msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd)
> > >         entry->msi_attrib.is_virtual    = 0;
> > >         entry->msi_attrib.entry_nr      = 0;
> > >         entry->msi_attrib.maskbit       = !!(control & PCI_MSI_FLAGS_MASKBIT);
> > > -       entry->msi_attrib.default_irq   = dev->irq;     /* Save IOAPIC IRQ */
> > >         entry->msi_attrib.multi_cap     = (control & PCI_MSI_FLAGS_QMASK) >> 1;
> > >         entry->msi_attrib.multiple      = ilog2(__roundup_pow_of_two(nvec));
> > >
> > > @@ -682,7 +681,6 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
> > >         dev->msi_enabled = 1;
> > >
> > >         pcibios_free_irq(dev);
> > > -       dev->irq = entry->irq;
> > >         return 0;
> > >  }
> > >
> > > @@ -742,7 +740,6 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base,
> > >                 entry->msi_attrib.is_virtual =
> > >                         entry->msi_attrib.entry_nr >= vec_count;
> > >
> > > -               entry->msi_attrib.default_irq   = dev->irq;
> > >                 entry->mask_base                = base;
> > >
> > >                 addr = pci_msix_desc_addr(entry);
> > > @@ -964,8 +961,6 @@ static void pci_msi_shutdown(struct pci_dev *dev)
> > >         mask = msi_mask(desc->msi_attrib.multi_cap);
> > >         msi_mask_irq(desc, mask, 0);
> > >
> > > -       /* Restore dev->irq to its default pin-assertion IRQ */
> > > -       dev->irq = desc->msi_attrib.default_irq;
> > >         pcibios_alloc_irq(dev);
> > >  }
> > >
> > > diff --git a/include/linux/msi.h b/include/linux/msi.h
> > > index e8bdcb83172b..a631664c1c38 100644
> > > --- a/include/linux/msi.h
> > > +++ b/include/linux/msi.h
> > > @@ -114,7 +114,6 @@ struct ti_sci_inta_msi_desc {
> > >   * @maskbit:   [PCI MSI/X] Mask-Pending bit supported?
> > >   * @is_64:     [PCI MSI/X] Address size: 0=32bit 1=64bit
> > >   * @entry_nr:  [PCI MSI/X] Entry which is described by this descriptor
> > > - * @default_irq:[PCI MSI/X] The default pre-assigned non-MSI irq
> > >   * @mask_pos:  [PCI MSI]   Mask register position
> > >   * @mask_base: [PCI MSI-X] Mask register base address
> > >   * @platform:  [platform]  Platform device specific msi descriptor data
> > > @@ -148,7 +147,6 @@ struct msi_desc {
> > >                                 u8      is_64           : 1;
> > >                                 u8      is_virtual      : 1;
> > >                                 u16     entry_nr;
> > > -                               unsigned default_irq;
> > >                         } msi_attrib;
> > >                         union {
> > >                                 u8      mask_pos;
> > >
> >
> > We will also need the below change as  pci_irq_vector() depends on
> > dev->irq for the MSI case.
> >
> > int pci_irq_vector(struct pci_dev *dev, unsigned int nr)
> > {
> >         if (dev->msix_enabled) {
> >                 struct msi_desc *entry;
> >                 int i = 0;
> >
> >                 for_each_pci_msi_entry(entry, dev) {
> >                         if (i == nr)
> >                                 return entry->irq;
> >                         i++;
> >                 }
> >                 WARN_ON_ONCE(1);
> >                 return -EINVAL;
> >         }
> >
> >         if (dev->msi_enabled) {
> >                 struct msi_desc *entry = first_pci_msi_entry(dev);
> >
> >                 if (WARN_ON_ONCE(nr >= entry->nvec_used))
> >                         return -EINVAL;
> >
> > +                return entry->irq + nr;
> >         } else {
> >                 if (WARN_ON_ONCE(nr > 0))
> >                         return -EINVAL;
> >         }
> >
> >
> > -        return dev->irq + nr;
> > +       return dev->irq;
> > }
> > EXPORT_SYMBOL(pci_irq_vector);
> >
> 
> And here:
> 
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -401,7 +401,7 @@ static void __pci_restore_msi_state(struct pci_dev *dev)
>         if (!dev->msi_enabled)
>                 return;
> 
> -       entry = irq_get_msi_desc(dev->irq);
> +       entry = first_pci_msi_entry(dev);
> 
>         pci_intx_for_msi(dev, 0);
>         pci_msi_set_enable(dev, 0);
> 
> since drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c is the only
> one calling pci_restore_msi_state(), will probably require one test
> from the driver maintainers.

All good catches. This needs tons of testing, and I fully expect
regressions (at least drivers falling back to INTx when they were
planning to use MSIs).

I'll try to roll a proper patch once 5.14 is out.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-08-25 10:52 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-20 22:37 [PATCH v2 0/2] PCI/MSI: Clarify the IRQ sysfs ABI for PCI devices Barry Song
2021-08-20 22:37 ` [PATCH v2 1/2] PCI/MSI: Fix the confusing IRQ sysfs ABI for MSI-X Barry Song
2021-08-20 23:33   ` Bjorn Helgaas
2021-08-21 10:42     ` Marc Zyngier
2021-08-21 22:14       ` Barry Song
2021-08-21 22:41         ` Barry Song
2021-08-23 10:33           ` Marc Zyngier
2021-08-24 19:25           ` Bjorn Helgaas
2021-08-23 10:30         ` Marc Zyngier
2021-08-23 11:03           ` Barry Song
2021-08-23 11:28             ` Marc Zyngier
2021-08-23 22:46               ` Barry Song
2021-08-24 19:34                 ` Bjorn Helgaas
2021-08-25  9:45                   ` Marc Zyngier
2021-08-24 20:51       ` Barry Song
2021-08-24 21:29         ` Barry Song
2021-08-25 10:24           ` Marc Zyngier
2021-08-24 22:51             ` Barry Song
2021-08-20 22:37 ` [PATCH v2 2/2] Documentation: ABI: sysfs-bus-pci: Add description for IRQ entry Barry Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).