LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Russ Weight <russell.h.weight@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>,
	"Adler, Michael" <michael.adler@intel.com>,
	"Whisonant, Tim" <tim.whisonant@intel.com>, <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Tom Rix <trix@redhat.com>
Subject: Re: BUG REPORT: vfio_pci driver
Date: Tue, 17 Aug 2021 10:16:28 -0700	[thread overview]
Message-ID: <0d16c181-ab34-8d7e-d9c7-5324a8c24900@intel.com> (raw)
In-Reply-To: <20210813160907.7b143b51.alex.williamson@redhat.com>

On 8/13/21 3:09 PM, Alex Williamson wrote:
> On Fri, 13 Aug 2021 11:34:51 -0700
> Russ Weight <russell.h.weight@intel.com> wrote:
>
>> Bug Description:
>>
>> A bug in the vfio_pci driver was reported in junction with work on FPGA
> This looks like the documented behavior of an IRQ index reporting the
> VFIO_IRQ_INFO_NORESIZE flag.  We can certainly work towards trying to
> remove the flag from this index, but it seems the userspace driver is
> currently ignoring the flag and expecting exactly the behavior the flag
> indicates is not available.  Thanks,

Thanks for the quick reply, Alex. Yes, we misunderstood the expected
behavior. We have adapted our library code and everything is
working now.

- Russ


>
> Alex
>
>> cards. We were able to reproduce and root-cause the bug using system-tap.
>> The original bug description is below. An understanding of the referenced
>> dfl and opae tools is not required - it is the sequence of IOCTL calls and
>> IRQ vectors that matters:
>>
>>> I’m trying to get an example AFU working that uses 2 IRQs, active at the same 
>>> time. I’m hitting what looks to be a dfl_pci driver bug.
>>>
>>> The code tries to allocate two IRQ vectors: 0 and 1. I see opaevfio.c doing the 
>>> right thing, picking the MSIX index. Allocating either IRQ 0 or IRQ 1 works fine 
>>> and I confirm that the VFIO_DEVICE_SET_IRQS looks reasonable, choosing MSIX and 
>>> either start of 0 or 1 and count 1.
>>>
>>> Note that opaevfio.c always passes count 1, so it will make separate calls for 
>>> each IRQ vector.
>>>
>>> When I try to allocate both, I see the following:
>>>
>>>   * If the VFIO_DEVICE_SET_IRQS ioctl is called first with start 0 and then
>>>     start 1 (always count 1), the start 1 (second) ioctl trap returns EINVAL.
>>>   * If I set up the vectors in decreasing order, so start 1 followed by start 0,
>>>     the program works!
>>>   * I ruled out OPAE SDK user space problems by setting up my program to
>>>     allocate in increasing order, which would normally fail. I changed only the
>>>     ioctl call in user space opaevfio.c, inverting bit 0 of start so that the
>>>     driver is called in decreasing index order. Of course this binds the wrong
>>>     vectors to the fds, but I don’t care about that for now. This works! From
>>>     this, I conclude that it can’t be a user space problem since the difference
>>>     between working and failing is solely the order in which IRQ vectors are
>>>     bound in ioctl calls.  
>> The EINVAL is coming from vfio_msi_set_block() here:
>> https://github.com/torvalds/linux/blob/master/drivers/vfio/pci/vfio_pci_intrs.c#L373
>>
>> vfio_msi_set_block() is being called from vfio_pci_set_msi_trigger() here on
>> the second IRQ request:
>> https://github.com/torvalds/linux/blob/master/drivers/vfio/pci/vfio_pci_intrs.c#L530
>>
>> We believe the bug is in vfio_pci_set_msi_trigger(), in the 2nd parameter to the call
>> to vfio_msi_enable() here:
>> https://github.com/torvalds/linux/blob/master/drivers/vfio/pci/vfio_pci_intrs.c#L533
>>
>> In both the passing and failing cases, the first IRQ request results in a call
>> to vfio_msi_enable() at line 533 and the second IRQ request results in the
>> call to vfio_msi_set_block() at line 530. It is during the first IRQ request
>> that vfio_msi_enable() sets vdev->num_ctx based on the 2nd parameter (nvec).
>> vdev->num_ctx is part of the conditional that results in the EINVAL for the
>> failing case.
>>
>> In the passing case, vdev->num_ctx is 2. In the failing case, it is 1.
>>
>> I am attaching two text files containing trace information from systemtap: one for
>> the failing case and one for the passing case. They contain a lot more information
>> than is needed, but if you search for vfio_pci_set_msi_trigger and vfio_msi_set_block,
>> you will see values for some of the call parameters.
>>
>> - Russ
>>


      reply	other threads:[~2021-08-17 17:16 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-13 18:34 Russ Weight
2021-08-13 22:09 ` Alex Williamson
2021-08-17 17:16   ` Russ Weight [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0d16c181-ab34-8d7e-d9c7-5324a8c24900@intel.com \
    --to=russell.h.weight@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.adler@intel.com \
    --cc=tim.whisonant@intel.com \
    --cc=trix@redhat.com \
    --subject='Re: BUG REPORT: vfio_pci driver' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).