LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Enabling RO on a VF
@ 2021-10-01 11:05 Haakon Bugge
  2021-10-01 11:54 ` Jason Gunthorpe
  0 siblings, 1 reply; 8+ messages in thread
From: Haakon Bugge @ 2021-10-01 11:05 UTC (permalink / raw)
  To: Leon Romanovsky, Doug Ledford, Jason Gunthorpe, Bjorn Helgaas
  Cc: OFED mailing list, linux-pci, LKML

Hey,


Commit 1477d44ce47d ("RDMA/mlx5: Enable Relaxed Ordering by default for kernel ULPs") uses pcie_relaxed_ordering_enabled() to check if RO can be enabled. This function checks if the Enable Relaxed Ordering bit in the Device Control register is set. However, on a VF, this bit is RsvdP (Reserved for future RW implementations. Register bits are read-only and must return zero when read. Software must preserve the value read for writes to bits.).

Hence, AFAICT, RO will not be enabled when using a VF.

How can that be fixed?


Thxs, Håkon





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Enabling RO on a VF
  2021-10-01 11:05 Enabling RO on a VF Haakon Bugge
@ 2021-10-01 11:54 ` Jason Gunthorpe
  2021-10-01 11:59   ` Haakon Bugge
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2021-10-01 11:54 UTC (permalink / raw)
  To: Haakon Bugge
  Cc: Leon Romanovsky, Doug Ledford, Bjorn Helgaas, OFED mailing list,
	linux-pci, LKML

On Fri, Oct 01, 2021 at 11:05:15AM +0000, Haakon Bugge wrote:
> Hey,
> 
> 
> Commit 1477d44ce47d ("RDMA/mlx5: Enable Relaxed Ordering by default
> for kernel ULPs") uses pcie_relaxed_ordering_enabled() to check if
> RO can be enabled. This function checks if the Enable Relaxed
> Ordering bit in the Device Control register is set. However, on a
> VF, this bit is RsvdP (Reserved for future RW
> implementations. Register bits are read-only and must return zero
> when read. Software must preserve the value read for writes to
> bits.).
> 
> Hence, AFAICT, RO will not be enabled when using a VF.
> 
> How can that be fixed?

When qemu takes a VF and turns it into a PF in a VM it must emulate
the RO bit and return one

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Enabling RO on a VF
  2021-10-01 11:54 ` Jason Gunthorpe
@ 2021-10-01 11:59   ` Haakon Bugge
  2021-10-01 12:01     ` Jason Gunthorpe
  0 siblings, 1 reply; 8+ messages in thread
From: Haakon Bugge @ 2021-10-01 11:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Bjorn Helgaas, OFED mailing list,
	linux-pci, LKML



> On 1 Oct 2021, at 13:54, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Fri, Oct 01, 2021 at 11:05:15AM +0000, Haakon Bugge wrote:
>> Hey,
>> 
>> 
>> Commit 1477d44ce47d ("RDMA/mlx5: Enable Relaxed Ordering by default
>> for kernel ULPs") uses pcie_relaxed_ordering_enabled() to check if
>> RO can be enabled. This function checks if the Enable Relaxed
>> Ordering bit in the Device Control register is set. However, on a
>> VF, this bit is RsvdP (Reserved for future RW
>> implementations. Register bits are read-only and must return zero
>> when read. Software must preserve the value read for writes to
>> bits.).
>> 
>> Hence, AFAICT, RO will not be enabled when using a VF.
>> 
>> How can that be fixed?
> 
> When qemu takes a VF and turns it into a PF in a VM it must emulate
> the RO bit and return one

I have a pass-through VF:

# lspci -s ff:00.0 -vvv
ff:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
[]
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-


Håkon


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Enabling RO on a VF
  2021-10-01 11:59   ` Haakon Bugge
@ 2021-10-01 12:01     ` Jason Gunthorpe
  2021-10-05 23:09       ` Si-Wei Liu
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2021-10-01 12:01 UTC (permalink / raw)
  To: Haakon Bugge
  Cc: Leon Romanovsky, Doug Ledford, Bjorn Helgaas, OFED mailing list,
	linux-pci, LKML

On Fri, Oct 01, 2021 at 11:59:15AM +0000, Haakon Bugge wrote:
> 
> 
> > On 1 Oct 2021, at 13:54, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > 
> > On Fri, Oct 01, 2021 at 11:05:15AM +0000, Haakon Bugge wrote:
> >> Hey,
> >> 
> >> 
> >> Commit 1477d44ce47d ("RDMA/mlx5: Enable Relaxed Ordering by default
> >> for kernel ULPs") uses pcie_relaxed_ordering_enabled() to check if
> >> RO can be enabled. This function checks if the Enable Relaxed
> >> Ordering bit in the Device Control register is set. However, on a
> >> VF, this bit is RsvdP (Reserved for future RW
> >> implementations. Register bits are read-only and must return zero
> >> when read. Software must preserve the value read for writes to
> >> bits.).
> >> 
> >> Hence, AFAICT, RO will not be enabled when using a VF.
> >> 
> >> How can that be fixed?
> > 
> > When qemu takes a VF and turns it into a PF in a VM it must emulate
> > the RO bit and return one
> 
> I have a pass-through VF:
> 
> # lspci -s ff:00.0 -vvv
> ff:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
> []
> 		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> 			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-

Like I said, it is a problem in the qemu area..

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Enabling RO on a VF
  2021-10-01 12:01     ` Jason Gunthorpe
@ 2021-10-05 23:09       ` Si-Wei Liu
  2021-10-05 23:28         ` Jason Gunthorpe
  0 siblings, 1 reply; 8+ messages in thread
From: Si-Wei Liu @ 2021-10-05 23:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Haakon Bugge, Leon Romanovsky, Doug Ledford, Bjorn Helgaas,
	OFED mailing list, linux-pci, LKML

On Fri, Oct 1, 2021 at 6:02 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 01, 2021 at 11:59:15AM +0000, Haakon Bugge wrote:
> >
> >
> > > On 1 Oct 2021, at 13:54, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, Oct 01, 2021 at 11:05:15AM +0000, Haakon Bugge wrote:
> > >> Hey,
> > >>
> > >>
> > >> Commit 1477d44ce47d ("RDMA/mlx5: Enable Relaxed Ordering by default
> > >> for kernel ULPs") uses pcie_relaxed_ordering_enabled() to check if
> > >> RO can be enabled. This function checks if the Enable Relaxed
> > >> Ordering bit in the Device Control register is set. However, on a
> > >> VF, this bit is RsvdP (Reserved for future RW
> > >> implementations. Register bits are read-only and must return zero
> > >> when read. Software must preserve the value read for writes to
> > >> bits.).
> > >>
> > >> Hence, AFAICT, RO will not be enabled when using a VF.
> > >>
> > >> How can that be fixed?
> > >
> > > When qemu takes a VF and turns it into a PF in a VM it must emulate
> > > the RO bit and return one
> >
> > I have a pass-through VF:
> >
> > # lspci -s ff:00.0 -vvv
> > ff:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
> > []
> >               DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> >                       RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
>
> Like I said, it is a problem in the qemu area..
>
> Jason
Can you clarify why this is a problem in the QEMU area?

Even though Mellanox device might well support it (on VF), there's no
way for QEMU to really know if an arbitrary passthrough device may
support RO. It either has to resort to the host kernel to detect all
PCIe device functions up to the root port throughout the PCIe fabric,
or it may follow PF's enabling status if it is at all capable. I don't
see what QEMU can do by just forcefully emulating the bit?

Not to mention the current implementation only takes care of broken
root port but not the intermediate switches.
https://lore.kernel.org/linux-arm-kernel/MWHPR12MB1600255ACFCD3FB3C80EB8B6C88B0@MWHPR12MB1600.namprd12.prod.outlook.com/

-Siwei

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Enabling RO on a VF
  2021-10-05 23:09       ` Si-Wei Liu
@ 2021-10-05 23:28         ` Jason Gunthorpe
  2021-10-12 17:57           ` Si-Wei Liu
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2021-10-05 23:28 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Haakon Bugge, Leon Romanovsky, Doug Ledford, Bjorn Helgaas,
	OFED mailing list, linux-pci, LKML

On Tue, Oct 05, 2021 at 04:09:54PM -0700, Si-Wei Liu wrote:
> On Fri, Oct 1, 2021 at 6:02 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Oct 01, 2021 at 11:59:15AM +0000, Haakon Bugge wrote:
> > >
> > >
> > > > On 1 Oct 2021, at 13:54, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Fri, Oct 01, 2021 at 11:05:15AM +0000, Haakon Bugge wrote:
> > > >> Hey,
> > > >>
> > > >>
> > > >> Commit 1477d44ce47d ("RDMA/mlx5: Enable Relaxed Ordering by default
> > > >> for kernel ULPs") uses pcie_relaxed_ordering_enabled() to check if
> > > >> RO can be enabled. This function checks if the Enable Relaxed
> > > >> Ordering bit in the Device Control register is set. However, on a
> > > >> VF, this bit is RsvdP (Reserved for future RW
> > > >> implementations. Register bits are read-only and must return zero
> > > >> when read. Software must preserve the value read for writes to
> > > >> bits.).
> > > >>
> > > >> Hence, AFAICT, RO will not be enabled when using a VF.
> > > >>
> > > >> How can that be fixed?
> > > >
> > > > When qemu takes a VF and turns it into a PF in a VM it must emulate
> > > > the RO bit and return one
> > >
> > > I have a pass-through VF:
> > >
> > > # lspci -s ff:00.0 -vvv
> > > ff:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
> > > []
> > >               DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> > >                       RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
> >
> > Like I said, it is a problem in the qemu area..
> >
> > Jason
> Can you clarify why this is a problem in the QEMU area?
> 
> Even though Mellanox device might well support it (on VF), there's no
> way for QEMU to really know if an arbitrary passthrough device may
> support RO. 

That isn't what the cap bit means

The cap bit on the PF completely disables generation of RO at the
device at all.

If the PF's cap bit is disabled then no VF can generate RO, and qemu
should expose a wired to zero RO bit in the emulated PF.

If the cap bit is enabled then the VFs could generate RO, depending on
their drivers, and qemu should generate defaulted to 1 bit in the
emulated PF.

> PCIe device functions up to the root port throughout the PCIe fabric,
> or it may follow PF's enabling status if it is at all capable. I don't
> see what QEMU can do by just forcefully emulating the bit?

IMHO Kernel/BIOS should be responsible to clear the RO bit at the PF
if RO is not supportable in the environment. It is proper to prevent
the device from using RO completely if it is broken.

> Not to mention the current implementation only takes care of broken
> root port but not the intermediate switches.
> https://lore.kernel.org/linux-arm-kernel/MWHPR12MB1600255ACFCD3FB3C80EB8B6C88B0@MWHPR12MB1600.namprd12.prod.outlook.com/

Which is what this message suggests doing

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Enabling RO on a VF
  2021-10-05 23:28         ` Jason Gunthorpe
@ 2021-10-12 17:57           ` Si-Wei Liu
  2021-10-14 22:23             ` Jason Gunthorpe
  0 siblings, 1 reply; 8+ messages in thread
From: Si-Wei Liu @ 2021-10-12 17:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Haakon Bugge, Leon Romanovsky, Doug Ledford, Bjorn Helgaas,
	OFED mailing list, linux-pci, LKML

On Tue, Oct 5, 2021 at 4:28 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Oct 05, 2021 at 04:09:54PM -0700, Si-Wei Liu wrote:
> > On Fri, Oct 1, 2021 at 6:02 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, Oct 01, 2021 at 11:59:15AM +0000, Haakon Bugge wrote:
> > > >
> > > >
> > > > > On 1 Oct 2021, at 13:54, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > >
> > > > > On Fri, Oct 01, 2021 at 11:05:15AM +0000, Haakon Bugge wrote:
> > > > >> Hey,
> > > > >>
> > > > >>
> > > > >> Commit 1477d44ce47d ("RDMA/mlx5: Enable Relaxed Ordering by default
> > > > >> for kernel ULPs") uses pcie_relaxed_ordering_enabled() to check if
> > > > >> RO can be enabled. This function checks if the Enable Relaxed
> > > > >> Ordering bit in the Device Control register is set. However, on a
> > > > >> VF, this bit is RsvdP (Reserved for future RW
> > > > >> implementations. Register bits are read-only and must return zero
> > > > >> when read. Software must preserve the value read for writes to
> > > > >> bits.).
> > > > >>
> > > > >> Hence, AFAICT, RO will not be enabled when using a VF.
> > > > >>
> > > > >> How can that be fixed?
> > > > >
> > > > > When qemu takes a VF and turns it into a PF in a VM it must emulate
> > > > > the RO bit and return one
> > > >
> > > > I have a pass-through VF:
> > > >
> > > > # lspci -s ff:00.0 -vvv
> > > > ff:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
> > > > []
> > > >               DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> > > >                       RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
> > >
> > > Like I said, it is a problem in the qemu area..
> > >
> > > Jason
> > Can you clarify why this is a problem in the QEMU area?
> >
> > Even though Mellanox device might well support it (on VF), there's no
> > way for QEMU to really know if an arbitrary passthrough device may
> > support RO.
>
> That isn't what the cap bit means
>
> The cap bit on the PF completely disables generation of RO at the
> device at all.
>
> If the PF's cap bit is disabled then no VF can generate RO, and qemu
> should expose a wired to zero RO bit in the emulated PF.
>
> If the cap bit is enabled then the VFs could generate RO, depending on
> their drivers, and qemu should generate defaulted to 1 bit in the
> emulated PF.

Set the broken root port and the P2P DMA cases aside, let's say we
have a RO enabled PF where there's a working root port upstream that
well supports RO. As VF mostly inherits PF's state/config, no matter
what value the DevCtl RlxdOrd bit presents in the host it doesn't mean
anything, although we know getting RO disabled on the PF implies
prohibiting RO TLP being sent by all its child VFs. There's no
question for this part. The real problem though, is if the RlxdOrd cap
bit for the VF can be controlled individually similar to the way the
toggling on PF is done? For e.g, suppose the RO cap bit for the VF
emulated by QEMU defaults to enabled where the backing PF and all
child VFs have RO enabled. Will a PCI write of zero to the bit be able
to prevent RO ULP initiated by that specific VF from being sent out,
which is to resemble PF's behaviour? This being the Mellanox VF's
specifics? More broadly, should the VFs for arbitrary PCIe devices
have that kind of control on an individual VF's level? I don't find it
anywhere in the PCIe SR-IOV spec that this should be the case.

-Siwei

>
> > PCIe device functions up to the root port throughout the PCIe fabric,
> > or it may follow PF's enabling status if it is at all capable. I don't
> > see what QEMU can do by just forcefully emulating the bit?
>
> IMHO Kernel/BIOS should be responsible to clear the RO bit at the PF
> if RO is not supportable in the environment. It is proper to prevent
> the device from using RO completely if it is broken.
>
> > Not to mention the current implementation only takes care of broken
> > root port but not the intermediate switches.
> > https://lore.kernel.org/linux-arm-kernel/MWHPR12MB1600255ACFCD3FB3C80EB8B6C88B0@MWHPR12MB1600.namprd12.prod.outlook.com/
>
> Which is what this message suggests doing
>
> Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Enabling RO on a VF
  2021-10-12 17:57           ` Si-Wei Liu
@ 2021-10-14 22:23             ` Jason Gunthorpe
  0 siblings, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2021-10-14 22:23 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Haakon Bugge, Leon Romanovsky, Doug Ledford, Bjorn Helgaas,
	OFED mailing list, linux-pci, LKML

On Tue, Oct 12, 2021 at 10:57:16AM -0700, Si-Wei Liu wrote:
> On Tue, Oct 5, 2021 at 4:28 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Oct 05, 2021 at 04:09:54PM -0700, Si-Wei Liu wrote:
> > > On Fri, Oct 1, 2021 at 6:02 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Fri, Oct 01, 2021 at 11:59:15AM +0000, Haakon Bugge wrote:
> > > > >
> > > > >
> > > > > > On 1 Oct 2021, at 13:54, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > > >
> > > > > > On Fri, Oct 01, 2021 at 11:05:15AM +0000, Haakon Bugge wrote:
> > > > > >> Hey,
> > > > > >>
> > > > > >>
> > > > > >> Commit 1477d44ce47d ("RDMA/mlx5: Enable Relaxed Ordering by default
> > > > > >> for kernel ULPs") uses pcie_relaxed_ordering_enabled() to check if
> > > > > >> RO can be enabled. This function checks if the Enable Relaxed
> > > > > >> Ordering bit in the Device Control register is set. However, on a
> > > > > >> VF, this bit is RsvdP (Reserved for future RW
> > > > > >> implementations. Register bits are read-only and must return zero
> > > > > >> when read. Software must preserve the value read for writes to
> > > > > >> bits.).
> > > > > >>
> > > > > >> Hence, AFAICT, RO will not be enabled when using a VF.
> > > > > >>
> > > > > >> How can that be fixed?
> > > > > >
> > > > > > When qemu takes a VF and turns it into a PF in a VM it must emulate
> > > > > > the RO bit and return one
> > > > >
> > > > > I have a pass-through VF:
> > > > >
> > > > > # lspci -s ff:00.0 -vvv
> > > > > ff:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
> > > > > []
> > > > >               DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> > > > >                       RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
> > > >
> > > > Like I said, it is a problem in the qemu area..
> > > >
> > > > Jason
> > > Can you clarify why this is a problem in the QEMU area?
> > >
> > > Even though Mellanox device might well support it (on VF), there's no
> > > way for QEMU to really know if an arbitrary passthrough device may
> > > support RO.
> >
> > That isn't what the cap bit means
> >
> > The cap bit on the PF completely disables generation of RO at the
> > device at all.
> >
> > If the PF's cap bit is disabled then no VF can generate RO, and qemu
> > should expose a wired to zero RO bit in the emulated PF.
> >
> > If the cap bit is enabled then the VFs could generate RO, depending on
> > their drivers, and qemu should generate defaulted to 1 bit in the
> > emulated PF.
> 
> Set the broken root port and the P2P DMA cases aside, let's say we
> have a RO enabled PF where there's a working root port upstream that
> well supports RO. As VF mostly inherits PF's state/config, no matter
> what value the DevCtl RlxdOrd bit presents in the host it doesn't mean
> anything,

Not quite if the guest sees RlxdOrd enabled then it means the guest
can expect that the device can send TLPs with relaxed ordering sent.

> although we know getting RO disabled on the PF implies
> prohibiting RO TLP being sent by all its child VFs. There's no
> question for this part. The real problem though, is if the RlxdOrd cap
> bit for the VF can be controlled individually similar to the way the
> toggling on PF is done?

It cannot.

Just like today where qemu wrongly reports disabled for a VF RlxdOrd
it doesn't mean that the VF cannot or does not issue Relaxed Ordering
TLPs.

Since the HW cannot have this level of fine grained control a full
emulation of the RlxdOrd bit is not possible for VFs.

> For e.g, suppose the RO cap bit for the VF
> emulated by QEMU defaults to enabled where the backing PF and all
> child VFs have RO enabled. Will a PCI write of zero to the bit be able
> to prevent RO ULP initiated by that specific VF from being sent out,
> which is to resemble PF's behaviour? 

Nope. HW can't do it.

> This being the Mellanox VF's specifics? 

It is not Mellanox specific, this is all PCI spec.

> More broadly, should the VFs for arbitrary PCIe devices have that
> kind of control on an individual VF's level? I don't find it
> anywhere in the PCIe SR-IOV spec that this should be the case.

They don't and for this discussion it doesn't matter.

Your question was about how to enable relaxed ordering in guests, and
the answer is for qemu to report enabled on rlxdord in the VF using
PCI config space emulation and continue to not support changing the
relaxed ordering mode of a VF (ie wired to enabled)

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-10-14 22:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-01 11:05 Enabling RO on a VF Haakon Bugge
2021-10-01 11:54 ` Jason Gunthorpe
2021-10-01 11:59   ` Haakon Bugge
2021-10-01 12:01     ` Jason Gunthorpe
2021-10-05 23:09       ` Si-Wei Liu
2021-10-05 23:28         ` Jason Gunthorpe
2021-10-12 17:57           ` Si-Wei Liu
2021-10-14 22:23             ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).