LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Halil Pasic <pasic@linux.ibm.com>
To: Cornelia Huck <cohuck@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>
Cc: kwankhede@nvidia.com, Dong Jia <bjsdjshi@linux.vnet.ibm.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 0/2] vfio/mdev: Device namespace protection
Date: Wed, 23 May 2018 14:29:28 +0200	[thread overview]
Message-ID: <dfeb44cc-39bf-1015-62d4-4644e6a8cf7b@linux.ibm.com> (raw)
In-Reply-To: <20180523105641.0d89701b.cohuck@redhat.com>



On 05/23/2018 10:56 AM, Cornelia Huck wrote:
> On Tue, 22 May 2018 12:38:29 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
>> On Tue, 22 May 2018 19:17:07 +0200
>> Halil Pasic <pasic@linux.ibm.com> wrote:
>>
>>>   From vfio-ccw perspective I join Connie's assessment: vfio-ccw should
>>> be fine with these changes. I'm however not too deeply involved with
>>> the mdev framework, thus I don't feel comfortable r-b-ing. That results
>>> in
>>> Acked-by: Halil Pasic <pasic@linux.ibm.com>
>>> for both patches.
>>>
>>> While at it I have would like to ask about the semantics and intended
>>> use of the mdev interfaces.
>>>
>>> static int vfio_ccw_sch_probe(struct subchannel *sch)
>>> {
>>>
>>> /* HALIL: 8< Not so interesting stuff happens here. >8 */
>>
>> This was interesting:
>>
>> 	private->state = VFIO_CCW_STATE_NOT_OPER;
>>
>>>           ret = vfio_ccw_mdev_reg(sch);
>>>           if (ret)
>>>                   goto out_disable;
>>> /*
>>>    * HALIL:
>>>    * This might be racy. Somewhere in vfio_ccw_mdev_reg() the create attribute
>>>    * is made available (it calls mdev_register_device()). For instance create will
>>>    * attempt to decrement private->avail which is initialized below. I fail to
>>>    * understand how is  this well synchronized.
>>>    */
>>>           INIT_WORK(&private->io_work, vfio_ccw_sch_io_todo);
>>>           atomic_set(&private->avail, 1);
>>>           private->state = VFIO_CCW_STATE_STANDBY;
>>>
>>>           return 0;
>>>
>>> out_disable:
>>>           cio_disable_subchannel(sch);
>>> out_free:
>>>           dev_set_drvdata(&sch->dev, NULL);
>>>           kfree(private);
>>>           return ret;
>>> }
>>>
>>> Should not initialization  of go before mdev_register_device(), and then rolled
>>> back if necessary if mdev_register_device() fails?
>>>
>>> In practice it does not seem very likely that userspace can trigger
>>> mdev_device_create() before vfio_ccw_sch_probe() finishes so it should
>>> not be a practical problem. But I would like to understand how synchronization
>>> is supposed to work.
>>>
>>> [Added Dong Jia, maybe he is also able to answer my question.]
>>
>> vfio_ccw_mdev_create() requires that private->state is not
>> VFIO_CCW_STATE_NOT_OPER but vfio_ccw_sch_probe() explicitly sets state
>> to this value before calling vfio_ccw_mdev_reg(), so a create should
>> return -ENODEV if racing with parent registration.  Is there something
>> else that I'm missing?  Thanks,
>>


Disclaimer: I did not do much kernel work up until now. I still have
much to learn.

I mostly agree with your analysis but I'm not sure if the conclusion should be
'and thus everything is good' or 'and thus indeed we do have a race, a
poorly handled one'.

One thing I'm not sure about is: can atomic_set(&private->avail, 1) and
private->state = VFIO_CCW_STATE_STANDBY be perceived as reordered by
e.g. some other cpu and thus vfio_ccw_mdev_create() or not. I tried to
figure it out based on Documentation/atomic_t.txt but was not very successful.
If these can be reordered we could observe -EPERM instead of -ENODEV, I
think.

Furthermore from your analysis I deduce that the client code (I think mdev
calls it vendor code) may rely on mdev_register_device() containing a
(RELEASE) barrier. We use a mutex in there so the barrier is there. And
the client code may rely on a (ACQUIRE) barrier before the create callback
is called. That should also be true and was true in the past too again because
of mutex usage.


>> Alex
> 
> No, I think your understanding is correct. We move the state from
> NOT_OPER to STANDBY only after we're set up completely, so our create
> callback will simply fail early with -ENODEV. This looks fine to me.
> 

This -ENODEV looks strange to me. Which device does not exist?  The
userspace were supposed to retry on this? It's not even -EAGAIN. Is it
documented somewhere?

If it's unavoidable (which I don't see why) I would prefer -EAGAIN. I
think throwing an -ENODEV at our userspace once in a blue moon (if ever)
because that is the way we 'handle' races in our code instead of avoiding
them is not very friendly.

And I'm not sure -EPERM is not possible (see my statement
about reordering of the writes above).


Regards,
Halil

  reply	other threads:[~2018-05-23 12:29 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-18 19:10 Alex Williamson
2018-05-18 19:10 ` [PATCH v4 1/2] vfio/mdev: Check globally for duplicate devices Alex Williamson
2018-05-18 19:37   ` Kirti Wankhede
2018-05-22  8:13   ` Cornelia Huck
2018-05-22 15:53     ` Alex Williamson
2018-05-23  4:53       ` Zhenyu Wang
2018-05-18 19:10 ` [PATCH v4 2/2] vfio/mdev: Re-order sysfs attribute creation Alex Williamson
2018-05-18 19:38   ` Kirti Wankhede
2018-05-22  8:14   ` Cornelia Huck
2018-05-18 19:37 ` [PATCH v4 0/2] vfio/mdev: Device namespace protection Kirti Wankhede
2018-05-22 17:17 ` Halil Pasic
2018-05-22 18:38   ` Alex Williamson
2018-05-23  8:56     ` Cornelia Huck
2018-05-23 12:29       ` Halil Pasic [this message]
2018-05-23 13:34         ` Cornelia Huck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dfeb44cc-39bf-1015-62d4-4644e6a8cf7b@linux.ibm.com \
    --to=pasic@linux.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=bjsdjshi@linux.vnet.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --subject='Re: [PATCH v4 0/2] vfio/mdev: Device namespace protection' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).