LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC PATCH 1/1] virtio: write back features before verify
@ 2021-09-30  1:20 Halil Pasic
  2021-09-30  8:04 ` Christian Borntraeger
                   ` (3 more replies)
  0 siblings, 4 replies; 52+ messages in thread
From: Halil Pasic @ 2021-09-30  1:20 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xie Yongji, virtualization, linux-kernel
  Cc: Halil Pasic, markver, Cornelia Huck, Christian Borntraeger, linux-s390

This patch fixes a regression introduced by commit 82e89ea077b9
("virtio-blk: Add validation for block size in config space") and
enables similar checks in verify() on big endian platforms.

The problem with checking multi-byte config fields in the verify
callback, on big endian platforms, and with a possibly transitional
device is the following. The verify() callback is called between
config->get_features() and virtio_finalize_features(). That we have a
device that offered F_VERSION_1 then we have the following options
either the device is transitional, and then it has to present the legacy
interface, i.e. a big endian config space until F_VERSION_1 is
negotiated, or we have a non-transitional device, which makes
F_VERSION_1 mandatory, and only implements the non-legacy interface and
thus presents a little endian config space. Because at this point we
can't know if the device is transitional or non-transitional, we can't
know do we need to byte swap or not.

The virtio spec explicitly states that the driver MAY read config
between reading and writing the features so saying that first accessing
the config before feature negotiation is done is not an option. The
specification ain't clear about setting the features multiple times
before FEATURES_OK, so I guess that should be fine.

I don't consider this patch super clean, but frankly I don't think we
have a ton of options. Another option that may or man not be cleaner,
but is also IMHO much uglier is to figure out whether the device is
transitional by rejecting _F_VERSION_1, then resetting it and proceeding
according tho what we have figured out, hoping that the characteristics
of the device didn't change.

Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
Reported-by: markver@us.ibm.com
---
 drivers/virtio/virtio.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 0a5b54034d4b..9dc3cfa17b1c 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
 		if (device_features & (1ULL << i))
 			__virtio_set_bit(dev, i);
 
+	/* Write back features before validate to know endianness */
+	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
+		dev->config->finalize_features(dev);
+
 	if (drv->validate) {
 		err = drv->validate(dev);
 		if (err)

base-commit: 02d5e016800d082058b3d3b7c3ede136cdc6ddcb
-- 
2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-09-30  1:20 [RFC PATCH 1/1] virtio: write back features before verify Halil Pasic
@ 2021-09-30  8:04 ` Christian Borntraeger
  2021-09-30  9:28 ` Cornelia Huck
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 52+ messages in thread
From: Christian Borntraeger @ 2021-09-30  8:04 UTC (permalink / raw)
  To: Halil Pasic, Michael S. Tsirkin, Jason Wang, Xie Yongji,
	virtualization, linux-kernel
  Cc: markver, Cornelia Huck, linux-s390



Am 30.09.21 um 03:20 schrieb Halil Pasic:
> This patch fixes a regression introduced by commit 82e89ea077b9
> ("virtio-blk: Add validation for block size in config space") and
> enables similar checks in verify() on big endian platforms.
> 
> The problem with checking multi-byte config fields in the verify
> callback, on big endian platforms, and with a possibly transitional
> device is the following. The verify() callback is called between
> config->get_features() and virtio_finalize_features(). That we have a
> device that offered F_VERSION_1 then we have the following options
> either the device is transitional, and then it has to present the legacy
> interface, i.e. a big endian config space until F_VERSION_1 is
> negotiated, or we have a non-transitional device, which makes
> F_VERSION_1 mandatory, and only implements the non-legacy interface and
> thus presents a little endian config space. Because at this point we
> can't know if the device is transitional or non-transitional, we can't
> know do we need to byte swap or not.
> 
> The virtio spec explicitly states that the driver MAY read config
> between reading and writing the features so saying that first accessing
> the config before feature negotiation is done is not an option. The
> specification ain't clear about setting the features multiple times
> before FEATURES_OK, so I guess that should be fine.
> 
> I don't consider this patch super clean, but frankly I don't think we
> have a ton of options. Another option that may or man not be cleaner,
> but is also IMHO much uglier is to figure out whether the device is
> transitional by rejecting _F_VERSION_1, then resetting it and proceeding
> according tho what we have figured out, hoping that the characteristics
> of the device didn't change.
> 
> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
> Reported-by: markver@us.ibm.com

To make sure that it lands there, meybe add
cc stable 5.14
> ---
>   drivers/virtio/virtio.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 0a5b54034d4b..9dc3cfa17b1c 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
>   		if (device_features & (1ULL << i))
>   			__virtio_set_bit(dev, i);
>   
> +	/* Write back features before validate to know endianness */
> +	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
> +		dev->config->finalize_features(dev);
> +
>   	if (drv->validate) {
>   		err = drv->validate(dev);
>   		if (err)
> 
> base-commit: 02d5e016800d082058b3d3b7c3ede136cdc6ddcb
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-09-30  1:20 [RFC PATCH 1/1] virtio: write back features before verify Halil Pasic
  2021-09-30  8:04 ` Christian Borntraeger
@ 2021-09-30  9:28 ` Cornelia Huck
  2021-09-30 11:03   ` Halil Pasic
  2021-09-30 11:12 ` Michael S. Tsirkin
  2021-10-01 14:34 ` Christian Borntraeger
  3 siblings, 1 reply; 52+ messages in thread
From: Cornelia Huck @ 2021-09-30  9:28 UTC (permalink / raw)
  To: Halil Pasic, Michael S. Tsirkin, Jason Wang, Xie Yongji,
	virtualization, linux-kernel
  Cc: Halil Pasic, markver, Christian Borntraeger, linux-s390

On Thu, Sep 30 2021, Halil Pasic <pasic@linux.ibm.com> wrote:

> This patch fixes a regression introduced by commit 82e89ea077b9
> ("virtio-blk: Add validation for block size in config space") and
> enables similar checks in verify() on big endian platforms.
>
> The problem with checking multi-byte config fields in the verify
> callback, on big endian platforms, and with a possibly transitional
> device is the following. The verify() callback is called between
> config->get_features() and virtio_finalize_features(). That we have a
> device that offered F_VERSION_1 then we have the following options
> either the device is transitional, and then it has to present the legacy
> interface, i.e. a big endian config space until F_VERSION_1 is
> negotiated, or we have a non-transitional device, which makes
> F_VERSION_1 mandatory, and only implements the non-legacy interface and
> thus presents a little endian config space. Because at this point we
> can't know if the device is transitional or non-transitional, we can't
> know do we need to byte swap or not.
>
> The virtio spec explicitly states that the driver MAY read config
> between reading and writing the features so saying that first accessing
> the config before feature negotiation is done is not an option. The
> specification ain't clear about setting the features multiple times
> before FEATURES_OK, so I guess that should be fine.
>
> I don't consider this patch super clean, but frankly I don't think we
> have a ton of options. Another option that may or man not be cleaner,
> but is also IMHO much uglier is to figure out whether the device is
> transitional by rejecting _F_VERSION_1, then resetting it and proceeding
> according tho what we have figured out, hoping that the characteristics
> of the device didn't change.
>
> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
> Reported-by: markver@us.ibm.com
> ---
>  drivers/virtio/virtio.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 0a5b54034d4b..9dc3cfa17b1c 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
>  		if (device_features & (1ULL << i))
>  			__virtio_set_bit(dev, i);
>  
> +	/* Write back features before validate to know endianness */
> +	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
> +		dev->config->finalize_features(dev);

This really looks like a mess :(

We end up calling ->finalize_features twice: once before ->validate, and
once after, that time with the complete song and dance. The first time,
we operate on one feature set; after validation, we operate on another,
and there might be interdependencies between the two (like a that a bit
is cleared because of another bit, which would not happen if validate
had a chance to clear that bit before).

I'm not sure whether that is even a problem in the spec: while the
driver may read the config before finally accepting features, it does
not really make sense to do so before a feature bit as basic as
VERSION_1 which determines the endianness has been negotiated. For
VERSION_1, we can probably go ahead and just assume that we will accept
it if offered, but what about other (future) bits?

> +
>  	if (drv->validate) {
>  		err = drv->validate(dev);
>  		if (err)


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-09-30  9:28 ` Cornelia Huck
@ 2021-09-30 11:03   ` Halil Pasic
  2021-09-30 11:31     ` Cornelia Huck
  0 siblings, 1 reply; 52+ messages in thread
From: Halil Pasic @ 2021-09-30 11:03 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Michael S. Tsirkin, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	Halil Pasic

On Thu, 30 Sep 2021 11:28:23 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> On Thu, Sep 30 2021, Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > This patch fixes a regression introduced by commit 82e89ea077b9
> > ("virtio-blk: Add validation for block size in config space") and
> > enables similar checks in verify() on big endian platforms.
> >
> > The problem with checking multi-byte config fields in the verify
> > callback, on big endian platforms, and with a possibly transitional
> > device is the following. The verify() callback is called between
> > config->get_features() and virtio_finalize_features(). That we have a
> > device that offered F_VERSION_1 then we have the following options
> > either the device is transitional, and then it has to present the legacy
> > interface, i.e. a big endian config space until F_VERSION_1 is
> > negotiated, or we have a non-transitional device, which makes
> > F_VERSION_1 mandatory, and only implements the non-legacy interface and
> > thus presents a little endian config space. Because at this point we
> > can't know if the device is transitional or non-transitional, we can't
> > know do we need to byte swap or not.
> >
> > The virtio spec explicitly states that the driver MAY read config
> > between reading and writing the features so saying that first accessing
> > the config before feature negotiation is done is not an option. The
> > specification ain't clear about setting the features multiple times
> > before FEATURES_OK, so I guess that should be fine.
> >
> > I don't consider this patch super clean, but frankly I don't think we
> > have a ton of options. Another option that may or man not be cleaner,
> > but is also IMHO much uglier is to figure out whether the device is
> > transitional by rejecting _F_VERSION_1, then resetting it and proceeding
> > according tho what we have figured out, hoping that the characteristics
> > of the device didn't change.
> >
> > Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
> > Reported-by: markver@us.ibm.com
> > ---
> >  drivers/virtio/virtio.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index 0a5b54034d4b..9dc3cfa17b1c 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
> >  		if (device_features & (1ULL << i))
> >  			__virtio_set_bit(dev, i);
> >  
> > +	/* Write back features before validate to know endianness */
> > +	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
> > +		dev->config->finalize_features(dev);  
> 
> This really looks like a mess :(
> 
> We end up calling ->finalize_features twice: once before ->validate, and
> once after, that time with the complete song and dance. The first time,
> we operate on one feature set; after validation, we operate on another,
> and there might be interdependencies between the two (like a that a bit
> is cleared because of another bit, which would not happen if validate
> had a chance to clear that bit before).

Basically the second set is a subset of the first set.

> 
> I'm not sure whether that is even a problem in the spec: while the
> driver may read the config before finally accepting features

I'm not sure I'm following you. Let me please qoute the specification:
"""
4. Read device feature bits, and write the subset of feature bits
understood by the OS and driver to the device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it. 
5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step. 
"""
https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-930001

> , it does
> not really make sense to do so before a feature bit as basic as
> VERSION_1 which determines the endianness has been negotiated. 

Are you suggesting that ->verify() should be after
virtio_finalize_features()? Wouldn't
that mean that verify() can't reject feature bits? But that is the whole
point of commit 82e89ea077b9 ("virtio-blk: Add validation for block size
in config space"). Do you think that the commit in question is
conceptually flawed? My understanding of the verify is, that it is supposed
to fence features and feature bits we can't support, e.g. because of
config space things, but I may be wrong.

The trouble is, feature bits are not negotiated one by one, but basically all
at once. I suppose, I did the next best thing to first negotiating VERSION_1.


> For
> VERSION_1, we can probably go ahead and just assume that we will accept
> it if offered, but what about other (future) bits?

I don't quite understand.

Anyway, how do you think we should solve this problem?

Regards,
Halil

> 
> > +
> >  	if (drv->validate) {
> >  		err = drv->validate(dev);
> >  		if (err)  
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-09-30  1:20 [RFC PATCH 1/1] virtio: write back features before verify Halil Pasic
  2021-09-30  8:04 ` Christian Borntraeger
  2021-09-30  9:28 ` Cornelia Huck
@ 2021-09-30 11:12 ` Michael S. Tsirkin
  2021-09-30 11:36   ` Cornelia Huck
  2021-10-01  7:21   ` Halil Pasic
  2021-10-01 14:34 ` Christian Borntraeger
  3 siblings, 2 replies; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-09-30 11:12 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Jason Wang, Xie Yongji, virtualization, linux-kernel, markver,
	Cornelia Huck, Christian Borntraeger, linux-s390

On Thu, Sep 30, 2021 at 03:20:49AM +0200, Halil Pasic wrote:
> This patch fixes a regression introduced by commit 82e89ea077b9
> ("virtio-blk: Add validation for block size in config space") and
> enables similar checks in verify() on big endian platforms.
> 
> The problem with checking multi-byte config fields in the verify
> callback, on big endian platforms, and with a possibly transitional
> device is the following. The verify() callback is called between
> config->get_features() and virtio_finalize_features(). That we have a
> device that offered F_VERSION_1 then we have the following options
> either the device is transitional, and then it has to present the legacy
> interface, i.e. a big endian config space until F_VERSION_1 is
> negotiated, or we have a non-transitional device, which makes
> F_VERSION_1 mandatory, and only implements the non-legacy interface and
> thus presents a little endian config space. Because at this point we
> can't know if the device is transitional or non-transitional, we can't
> know do we need to byte swap or not.

Hmm which transport does this refer to?
Distinguishing between legacy and modern drivers is transport
specific.  PCI presents
legacy and modern at separate addresses so distinguishing
between these two should be no trouble.
Channel i/o has versioning so same thing?

> The virtio spec explicitly states that the driver MAY read config
> between reading and writing the features so saying that first accessing
> the config before feature negotiation is done is not an option. The
> specification ain't clear about setting the features multiple times
> before FEATURES_OK, so I guess that should be fine.
> 
> I don't consider this patch super clean, but frankly I don't think we
> have a ton of options. Another option that may or man not be cleaner,
> but is also IMHO much uglier is to figure out whether the device is
> transitional by rejecting _F_VERSION_1, then resetting it and proceeding
> according tho what we have figured out, hoping that the characteristics
> of the device didn't change.

I am confused here. So is the problem at the device or at the driver level?
I suspect it's actually the host that has the issue, not
the guest?


> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
> Reported-by: markver@us.ibm.com
> ---
>  drivers/virtio/virtio.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 0a5b54034d4b..9dc3cfa17b1c 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
>  		if (device_features & (1ULL << i))
>  			__virtio_set_bit(dev, i);
>  
> +	/* Write back features before validate to know endianness */
> +	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
> +		dev->config->finalize_features(dev);
> +
>  	if (drv->validate) {
>  		err = drv->validate(dev);
>  		if (err)
> 
> base-commit: 02d5e016800d082058b3d3b7c3ede136cdc6ddcb
> -- 
> 2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-09-30 11:03   ` Halil Pasic
@ 2021-09-30 11:31     ` Cornelia Huck
  2021-10-01 14:22       ` Halil Pasic
  2021-10-02 12:09       ` Michael S. Tsirkin
  0 siblings, 2 replies; 52+ messages in thread
From: Cornelia Huck @ 2021-09-30 11:31 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Michael S. Tsirkin, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	Halil Pasic

On Thu, Sep 30 2021, Halil Pasic <pasic@linux.ibm.com> wrote:

> On Thu, 30 Sep 2021 11:28:23 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
>
>> On Thu, Sep 30 2021, Halil Pasic <pasic@linux.ibm.com> wrote:
>> 
>> > This patch fixes a regression introduced by commit 82e89ea077b9
>> > ("virtio-blk: Add validation for block size in config space") and
>> > enables similar checks in verify() on big endian platforms.
>> >
>> > The problem with checking multi-byte config fields in the verify
>> > callback, on big endian platforms, and with a possibly transitional
>> > device is the following. The verify() callback is called between
>> > config->get_features() and virtio_finalize_features(). That we have a
>> > device that offered F_VERSION_1 then we have the following options
>> > either the device is transitional, and then it has to present the legacy
>> > interface, i.e. a big endian config space until F_VERSION_1 is
>> > negotiated, or we have a non-transitional device, which makes
>> > F_VERSION_1 mandatory, and only implements the non-legacy interface and
>> > thus presents a little endian config space. Because at this point we
>> > can't know if the device is transitional or non-transitional, we can't
>> > know do we need to byte swap or not.
>> >
>> > The virtio spec explicitly states that the driver MAY read config
>> > between reading and writing the features so saying that first accessing
>> > the config before feature negotiation is done is not an option. The
>> > specification ain't clear about setting the features multiple times
>> > before FEATURES_OK, so I guess that should be fine.
>> >
>> > I don't consider this patch super clean, but frankly I don't think we
>> > have a ton of options. Another option that may or man not be cleaner,
>> > but is also IMHO much uglier is to figure out whether the device is
>> > transitional by rejecting _F_VERSION_1, then resetting it and proceeding
>> > according tho what we have figured out, hoping that the characteristics
>> > of the device didn't change.
>> >
>> > Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
>> > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
>> > Reported-by: markver@us.ibm.com
>> > ---
>> >  drivers/virtio/virtio.c | 4 ++++
>> >  1 file changed, 4 insertions(+)
>> >
>> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
>> > index 0a5b54034d4b..9dc3cfa17b1c 100644
>> > --- a/drivers/virtio/virtio.c
>> > +++ b/drivers/virtio/virtio.c
>> > @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
>> >  		if (device_features & (1ULL << i))
>> >  			__virtio_set_bit(dev, i);
>> >  
>> > +	/* Write back features before validate to know endianness */
>> > +	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
>> > +		dev->config->finalize_features(dev);  
>> 
>> This really looks like a mess :(
>> 
>> We end up calling ->finalize_features twice: once before ->validate, and
>> once after, that time with the complete song and dance. The first time,
>> we operate on one feature set; after validation, we operate on another,
>> and there might be interdependencies between the two (like a that a bit
>> is cleared because of another bit, which would not happen if validate
>> had a chance to clear that bit before).
>
> Basically the second set is a subset of the first set.

I don't think that's clear.

>
>> 
>> I'm not sure whether that is even a problem in the spec: while the
>> driver may read the config before finally accepting features
>
> I'm not sure I'm following you. Let me please qoute the specification:
> """
> 4. Read device feature bits, and write the subset of feature bits
> understood by the OS and driver to the device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it. 
> 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step. 
> """
> https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-930001

Yes, exactly, it MAY read before accepting features. How does the device
know whether the config space is little-endian or not?

>
>> , it does
>> not really make sense to do so before a feature bit as basic as
>> VERSION_1 which determines the endianness has been negotiated. 
>
> Are you suggesting that ->verify() should be after
> virtio_finalize_features()?

No, that would defeat the entire purpose of verify. After
virtio_finalize_features(), we are done with feature negotiation.

> Wouldn't
> that mean that verify() can't reject feature bits? But that is the whole
> point of commit 82e89ea077b9 ("virtio-blk: Add validation for block size
> in config space"). Do you think that the commit in question is
> conceptually flawed? My understanding of the verify is, that it is supposed
> to fence features and feature bits we can't support, e.g. because of
> config space things, but I may be wrong.

No, that commit is not really flawed on its own, I think the whole
procedure may be problematic.

>
> The trouble is, feature bits are not negotiated one by one, but basically all
> at once. I suppose, I did the next best thing to first negotiating
> VERSION_1.

We probably need to special-case VERSION_1 to move at least forward;
i.e. proceed as if we accepted it when reading the config space.

The problem is that we do not know what the device assumes when we read
the config space prior to setting FEATURES_OK. It may assume
little-endian if it offered VERSION_1, or it may not. The spec does not
really say what happens before feature negotiation has finished.

>
>
>> For
>> VERSION_1, we can probably go ahead and just assume that we will accept
>> it if offered, but what about other (future) bits?
>
> I don't quite understand.

There might be other bits in the future that change how the config space
works. We cannot assume that any of those bits will be accepted if
offered; i.e. we need a special hack for VERSION_1.

>
> Anyway, how do you think we should solve this problem?

This is a mess. For starters, we need to think about if we should do
something in the spec, and if yes, what.. Then, we can probably think
about how to implement that properly.

As we have an error right now that is basically a regression, we
probably need a band-aid to keep going. Not sure if your patch is the
right approach, maybe we really need to special-case VERSION_1 (the
"assume we accepted it" hack mentioned above.) This will likely fix the
reported problem (I assume that is s390x on QEMU); do we know about
other VMMs? Any other big-endian architectures?

Anyone have any better suggestions?


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-09-30 11:12 ` Michael S. Tsirkin
@ 2021-09-30 11:36   ` Cornelia Huck
  2021-10-02 18:20     ` Michael S. Tsirkin
  2021-10-01  7:21   ` Halil Pasic
  1 sibling, 1 reply; 52+ messages in thread
From: Cornelia Huck @ 2021-09-30 11:36 UTC (permalink / raw)
  To: Michael S. Tsirkin, Halil Pasic
  Cc: Jason Wang, Xie Yongji, virtualization, linux-kernel, markver,
	Christian Borntraeger, linux-s390

On Thu, Sep 30 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Sep 30, 2021 at 03:20:49AM +0200, Halil Pasic wrote:
>> This patch fixes a regression introduced by commit 82e89ea077b9
>> ("virtio-blk: Add validation for block size in config space") and
>> enables similar checks in verify() on big endian platforms.
>> 
>> The problem with checking multi-byte config fields in the verify
>> callback, on big endian platforms, and with a possibly transitional
>> device is the following. The verify() callback is called between
>> config->get_features() and virtio_finalize_features(). That we have a
>> device that offered F_VERSION_1 then we have the following options
>> either the device is transitional, and then it has to present the legacy
>> interface, i.e. a big endian config space until F_VERSION_1 is
>> negotiated, or we have a non-transitional device, which makes
>> F_VERSION_1 mandatory, and only implements the non-legacy interface and
>> thus presents a little endian config space. Because at this point we
>> can't know if the device is transitional or non-transitional, we can't
>> know do we need to byte swap or not.
>
> Hmm which transport does this refer to?
> Distinguishing between legacy and modern drivers is transport
> specific.  PCI presents
> legacy and modern at separate addresses so distinguishing
> between these two should be no trouble.

Hm, what about transitional devices?

> Channel i/o has versioning so same thing?

It can turn off VERSION_1, but not legacy. (I had hacked up a patchset
to potentially disable legacy some time ago, but did not have any
resources to follow up on this.)

>
>> The virtio spec explicitly states that the driver MAY read config
>> between reading and writing the features so saying that first accessing
>> the config before feature negotiation is done is not an option. The
>> specification ain't clear about setting the features multiple times
>> before FEATURES_OK, so I guess that should be fine.
>> 
>> I don't consider this patch super clean, but frankly I don't think we
>> have a ton of options. Another option that may or man not be cleaner,
>> but is also IMHO much uglier is to figure out whether the device is
>> transitional by rejecting _F_VERSION_1, then resetting it and proceeding
>> according tho what we have figured out, hoping that the characteristics
>> of the device didn't change.
>
> I am confused here. So is the problem at the device or at the driver level?
> I suspect it's actually the host that has the issue, not
> the guest?

From my perspective the problem is that the version of the device
remains in limbo as long as the features have not yet been finalized,
which means that the endianness of the config space remains in limbo as
well. Both device and driver might come to different conclusions.

>
>
>> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
>> Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
>> Reported-by: markver@us.ibm.com
>> ---
>>  drivers/virtio/virtio.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>> 
>> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
>> index 0a5b54034d4b..9dc3cfa17b1c 100644
>> --- a/drivers/virtio/virtio.c
>> +++ b/drivers/virtio/virtio.c
>> @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
>>  		if (device_features & (1ULL << i))
>>  			__virtio_set_bit(dev, i);
>>  
>> +	/* Write back features before validate to know endianness */
>> +	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
>> +		dev->config->finalize_features(dev);
>> +
>>  	if (drv->validate) {
>>  		err = drv->validate(dev);
>>  		if (err)
>> 
>> base-commit: 02d5e016800d082058b3d3b7c3ede136cdc6ddcb
>> -- 
>> 2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-09-30 11:12 ` Michael S. Tsirkin
  2021-09-30 11:36   ` Cornelia Huck
@ 2021-10-01  7:21   ` Halil Pasic
  2021-10-02 10:21     ` Michael S. Tsirkin
  1 sibling, 1 reply; 52+ messages in thread
From: Halil Pasic @ 2021-10-01  7:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Xie Yongji, virtualization, linux-kernel, markver,
	Cornelia Huck, Christian Borntraeger, linux-s390, Halil Pasic

On Thu, 30 Sep 2021 07:12:21 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Sep 30, 2021 at 03:20:49AM +0200, Halil Pasic wrote:
> > This patch fixes a regression introduced by commit 82e89ea077b9
> > ("virtio-blk: Add validation for block size in config space") and
> > enables similar checks in verify() on big endian platforms.
> > 
> > The problem with checking multi-byte config fields in the verify
> > callback, on big endian platforms, and with a possibly transitional
> > device is the following. The verify() callback is called between
> > config->get_features() and virtio_finalize_features(). That we have a
> > device that offered F_VERSION_1 then we have the following options
> > either the device is transitional, and then it has to present the legacy
> > interface, i.e. a big endian config space until F_VERSION_1 is
> > negotiated, or we have a non-transitional device, which makes
> > F_VERSION_1 mandatory, and only implements the non-legacy interface and
> > thus presents a little endian config space. Because at this point we
> > can't know if the device is transitional or non-transitional, we can't
> > know do we need to byte swap or not.  
> 
> Hmm which transport does this refer to?

It is the same with virtio-ccw and virtio-pci. I see the same problem
with both on s390x. I didn't try with virtio-blk-pci-non-transitional
yet (have to figure out how to do that with libvirt) for pci I used
virtio-blk-pci.

> Distinguishing between legacy and modern drivers is transport
> specific.  PCI presents
> legacy and modern at separate addresses so distinguishing
> between these two should be no trouble.

You mean the device id? Yes that is bolted down in the spec, but
currently we don't exploit that information. Furthermore there
is a fat chance that with QEMU even the allegedly non-transitional
devices only present a little endian config space after VERSION_1
was negotiated. Namely get_config for virtio-blk is implemented in
virtio_blk_update_config() which does virtio_stl_p(vdev,
&blkcfg.blk_size, blk_size) and in there we don't care
about transitional or not:

static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
{
#if defined(LEGACY_VIRTIO_IS_BIENDIAN)
    return virtio_is_big_endian(vdev);
#elif defined(TARGET_WORDS_BIGENDIAN)
    if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
        /* Devices conforming to VIRTIO 1.0 or later are always LE. */
        return false;
    }
    return true;
#else
    return false;
#endif
}


> Channel i/o has versioning so same thing?
>

Don't think so. Both a transitional and a non-transitional device
would have to accept revisions higher than 0 if the driver tried to
negotiate those (and we do in our case).
 
> > The virtio spec explicitly states that the driver MAY read config
> > between reading and writing the features so saying that first accessing
> > the config before feature negotiation is done is not an option. The
> > specification ain't clear about setting the features multiple times
> > before FEATURES_OK, so I guess that should be fine.
> > 
> > I don't consider this patch super clean, but frankly I don't think we
> > have a ton of options. Another option that may or man not be cleaner,
> > but is also IMHO much uglier is to figure out whether the device is
> > transitional by rejecting _F_VERSION_1, then resetting it and proceeding
> > according tho what we have figured out, hoping that the characteristics
> > of the device didn't change.  
> 
> I am confused here. So is the problem at the device or at the driver level?

We have a driver regression. Since the 82e89ea077b9 ("virtio-blk: Add
validation for block size in config space") virtio-blk is broken on
s390.

The deeper problem is in the spec. We stated that the driver may read
config space before the feature negotiation is finalized, but we didn't
think enough about what happens when native endiannes is not little
endian in the different cases.

I believe, for non-transitional devices we have a problem in the host as
well (i.e. in QEMU).

> I suspect it's actually the host that has the issue, not
> the guest?

I tend to say we have a problem both in the host and in the guest. I'm
more concerned about the problem in the guest, because that is a really
nasty regression. For the host. I think for legacy we don't have a
problem, because both sides would operate on the assumption no
_F_VERSION_1, IMHO the implementation for the transitional devices is
correct. For non-transitional flavor, it depends on the device. For
example virtio-net and virtio-blk is broken, because we use primitives
like virtio_stl_p() and those don't do the right thing before feature
negotiation is completed. On the other hand virtio-crypto.c as a truly
non-transitional device uses stl_le_p() and IMHO does the right thing.

Thanks for your comments! I hope I managed to answer your questions. I
need some guidance on how do we want to move forward on this.

Regards,
Halil

> 
> 
> > Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
> > Reported-by: markver@us.ibm.com
> > ---
> >  drivers/virtio/virtio.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index 0a5b54034d4b..9dc3cfa17b1c 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
> >  		if (device_features & (1ULL << i))
> >  			__virtio_set_bit(dev, i);
> >  
> > +	/* Write back features before validate to know endianness */
> > +	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
> > +		dev->config->finalize_features(dev);
> > +
> >  	if (drv->validate) {
> >  		err = drv->validate(dev);
> >  		if (err)
> > 
> > base-commit: 02d5e016800d082058b3d3b7c3ede136cdc6ddcb
> > -- 
> > 2.25.1  
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-09-30 11:31     ` Cornelia Huck
@ 2021-10-01 14:22       ` Halil Pasic
  2021-10-01 15:18         ` Cornelia Huck
  2021-10-02 12:09       ` Michael S. Tsirkin
  1 sibling, 1 reply; 52+ messages in thread
From: Halil Pasic @ 2021-10-01 14:22 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Michael S. Tsirkin, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	Halil Pasic

On Thu, 30 Sep 2021 13:31:04 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> On Thu, Sep 30 2021, Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Thu, 30 Sep 2021 11:28:23 +0200
> > Cornelia Huck <cohuck@redhat.com> wrote:
> >  
> >> On Thu, Sep 30 2021, Halil Pasic <pasic@linux.ibm.com> wrote:
> >>   
> >> > This patch fixes a regression introduced by commit 82e89ea077b9
> >> > ("virtio-blk: Add validation for block size in config space") and
> >> > enables similar checks in verify() on big endian platforms.
> >> >
> >> > The problem with checking multi-byte config fields in the verify
> >> > callback, on big endian platforms, and with a possibly transitional
> >> > device is the following. The verify() callback is called between
> >> > config->get_features() and virtio_finalize_features(). That we have a
> >> > device that offered F_VERSION_1 then we have the following options
> >> > either the device is transitional, and then it has to present the legacy
> >> > interface, i.e. a big endian config space until F_VERSION_1 is
> >> > negotiated, or we have a non-transitional device, which makes
> >> > F_VERSION_1 mandatory, and only implements the non-legacy interface and
> >> > thus presents a little endian config space. Because at this point we
> >> > can't know if the device is transitional or non-transitional, we can't
> >> > know do we need to byte swap or not.
> >> >
> >> > The virtio spec explicitly states that the driver MAY read config
> >> > between reading and writing the features so saying that first accessing
> >> > the config before feature negotiation is done is not an option. The
> >> > specification ain't clear about setting the features multiple times
> >> > before FEATURES_OK, so I guess that should be fine.
> >> >
> >> > I don't consider this patch super clean, but frankly I don't think we
> >> > have a ton of options. Another option that may or man not be cleaner,
> >> > but is also IMHO much uglier is to figure out whether the device is
> >> > transitional by rejecting _F_VERSION_1, then resetting it and proceeding
> >> > according tho what we have figured out, hoping that the characteristics
> >> > of the device didn't change.
> >> >
> >> > Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> >> > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
> >> > Reported-by: markver@us.ibm.com
> >> > ---
> >> >  drivers/virtio/virtio.c | 4 ++++
> >> >  1 file changed, 4 insertions(+)
> >> >
> >> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> >> > index 0a5b54034d4b..9dc3cfa17b1c 100644
> >> > --- a/drivers/virtio/virtio.c
> >> > +++ b/drivers/virtio/virtio.c
> >> > @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
> >> >  		if (device_features & (1ULL << i))
> >> >  			__virtio_set_bit(dev, i);
> >> >  
> >> > +	/* Write back features before validate to know endianness */
> >> > +	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
> >> > +		dev->config->finalize_features(dev);    
> >> 
> >> This really looks like a mess :(
> >> 
> >> We end up calling ->finalize_features twice: once before ->validate, and
> >> once after, that time with the complete song and dance. The first time,
> >> we operate on one feature set; after validation, we operate on another,
> >> and there might be interdependencies between the two (like a that a bit
> >> is cleared because of another bit, which would not happen if validate
> >> had a chance to clear that bit before).  
> >
> > Basically the second set is a subset of the first set.  
> 
> I don't think that's clear.

Validate can only remove features, or? So I guess after validate
is a subset of before validate.


> 
> >  
> >> 
> >> I'm not sure whether that is even a problem in the spec: while the
> >> driver may read the config before finally accepting features  
> >
> > I'm not sure I'm following you. Let me please qoute the specification:
> > """
> > 4. Read device feature bits, and write the subset of feature bits
> > understood by the OS and driver to the device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it. 
> > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step. 
> > """
> > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-930001  
> 
> Yes, exactly, it MAY read before accepting features. How does the device
> know whether the config space is little-endian or not?
> 

Well that is what we are talking about. One can try to infer things from
the spec. This reset dance I called ugly is probably the cleanest,
because the spec says that re-nego should work.

> >  
> >> , it does
> >> not really make sense to do so before a feature bit as basic as
> >> VERSION_1 which determines the endianness has been negotiated.   
> >
> > Are you suggesting that ->verify() should be after
> > virtio_finalize_features()?  
> 
> No, that would defeat the entire purpose of verify. After
> virtio_finalize_features(), we are done with feature negotiation.
>

Exactly!
 
> > Wouldn't
> > that mean that verify() can't reject feature bits? But that is the whole
> > point of commit 82e89ea077b9 ("virtio-blk: Add validation for block size
> > in config space"). Do you think that the commit in question is
> > conceptually flawed? My understanding of the verify is, that it is supposed
> > to fence features and feature bits we can't support, e.g. because of
> > config space things, but I may be wrong.  
> 
> No, that commit is not really flawed on its own, I think the whole
> procedure may be problematic.
> 

I agree! But that regression really hurts us. Maybe the best band-aid is
to conditional-compile it (not compile the check if s390).

> >
> > The trouble is, feature bits are not negotiated one by one, but basically all
> > at once. I suppose, I did the next best thing to first negotiating
> > VERSION_1.  
> 
> We probably need to special-case VERSION_1 to move at least forward;
> i.e. proceed as if we accepted it when reading the config space.
> 
> The problem is that we do not know what the device assumes when we read
> the config space prior to setting FEATURES_OK. It may assume
> little-endian if it offered VERSION_1, or it may not. The spec does not
> really say what happens before feature negotiation has finished.
> 
No it does not, but I hope, the implementations we care the most about do
little endian if VERSION_1 is set but FEATURES_OK is not yet done. A
transitional device would have to act upon a feature that is set,
because for legacy there is no FEATURES_OK. Where we can run into
trouble is minimum required feature set, e.g. mandatory features.

I will do some testing.

> >
> >  
> >> For
> >> VERSION_1, we can probably go ahead and just assume that we will accept
> >> it if offered, but what about other (future) bits?  
> >
> > I don't quite understand.  
> 
> There might be other bits in the future that change how the config space
> works. We cannot assume that any of those bits will be accepted if
> offered; i.e. we need a special hack for VERSION_1.

I tend to agree. What I didn't consider in this patch is that, setting
bits does not only set bits, but may also change the device in a way,
that clearing the bit would not change it back.

> 
> >
> > Anyway, how do you think we should solve this problem?  
> 
> This is a mess. For starters, we need to think about if we should do
> something in the spec, and if yes, what.. Then, we can probably think
> about how to implement that properly.
>

I agree.

 
> As we have an error right now that is basically a regression, we
> probably need a band-aid to keep going. Not sure if your patch is the
> right approach, maybe we really need to special-case VERSION_1 (the
> "assume we accepted it" hack mentioned above.) This will likely fix the
> reported problem (I assume that is s390x on QEMU); do we know about
> other VMMs? Any other big-endian architectures?

I didn't quite get it. Would this hack take place in QEMU or in the guest
kernel?

> 
> Anyone have any better suggestions?
> 

There is the conditional compile, as an option but I would not say it is
better.

Regards,
Halil

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-09-30  1:20 [RFC PATCH 1/1] virtio: write back features before verify Halil Pasic
                   ` (2 preceding siblings ...)
  2021-09-30 11:12 ` Michael S. Tsirkin
@ 2021-10-01 14:34 ` Christian Borntraeger
  3 siblings, 0 replies; 52+ messages in thread
From: Christian Borntraeger @ 2021-10-01 14:34 UTC (permalink / raw)
  To: Halil Pasic, Michael S. Tsirkin, Jason Wang, Xie Yongji,
	virtualization, linux-kernel
  Cc: markver, Cornelia Huck, linux-s390

Am 30.09.21 um 03:20 schrieb Halil Pasic:
> This patch fixes a regression introduced by commit 82e89ea077b9
> ("virtio-blk: Add validation for block size in config space") and
> enables similar checks in verify() on big endian platforms.
> 
> The problem with checking multi-byte config fields in the verify
> callback, on big endian platforms, and with a possibly transitional
> device is the following. The verify() callback is called between
> config->get_features() and virtio_finalize_features(). That we have a
> device that offered F_VERSION_1 then we have the following options
> either the device is transitional, and then it has to present the legacy
> interface, i.e. a big endian config space until F_VERSION_1 is
> negotiated, or we have a non-transitional device, which makes
> F_VERSION_1 mandatory, and only implements the non-legacy interface and
> thus presents a little endian config space. Because at this point we
> can't know if the device is transitional or non-transitional, we can't
> know do we need to byte swap or not.
> 
> The virtio spec explicitly states that the driver MAY read config
> between reading and writing the features so saying that first accessing
> the config before feature negotiation is done is not an option. The
> specification ain't clear about setting the features multiple times
> before FEATURES_OK, so I guess that should be fine.
> 
> I don't consider this patch super clean, but frankly I don't think we
> have a ton of options. Another option that may or man not be cleaner,
> but is also IMHO much uglier is to figure out whether the device is
> transitional by rejecting _F_VERSION_1, then resetting it and proceeding
> according tho what we have figured out, hoping that the characteristics
> of the device didn't change.
> 
> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
> Reported-by: markver@us.ibm.com

Just to make this more obvious. Since 5.14 DASD devices as backing for virtio-blk no
longer work as the block size is no longer reported to the guest. So we need a fix
for the issue.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-01 14:22       ` Halil Pasic
@ 2021-10-01 15:18         ` Cornelia Huck
  2021-10-02 18:13           ` Michael S. Tsirkin
  0 siblings, 1 reply; 52+ messages in thread
From: Cornelia Huck @ 2021-10-01 15:18 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Michael S. Tsirkin, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	Halil Pasic

On Fri, Oct 01 2021, Halil Pasic <pasic@linux.ibm.com> wrote:

> On Thu, 30 Sep 2021 13:31:04 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
>
>> On Thu, Sep 30 2021, Halil Pasic <pasic@linux.ibm.com> wrote:
>> 
>> > On Thu, 30 Sep 2021 11:28:23 +0200
>> > Cornelia Huck <cohuck@redhat.com> wrote:
>> >  
>> >> On Thu, Sep 30 2021, Halil Pasic <pasic@linux.ibm.com> wrote:
>> >> > @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
>> >> >  		if (device_features & (1ULL << i))
>> >> >  			__virtio_set_bit(dev, i);
>> >> >  
>> >> > +	/* Write back features before validate to know endianness */
>> >> > +	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
>> >> > +		dev->config->finalize_features(dev);    
>> >> 
>> >> This really looks like a mess :(
>> >> 
>> >> We end up calling ->finalize_features twice: once before ->validate, and
>> >> once after, that time with the complete song and dance. The first time,
>> >> we operate on one feature set; after validation, we operate on another,
>> >> and there might be interdependencies between the two (like a that a bit
>> >> is cleared because of another bit, which would not happen if validate
>> >> had a chance to clear that bit before).  
>> >
>> > Basically the second set is a subset of the first set.  
>> 
>> I don't think that's clear.
>
> Validate can only remove features, or? So I guess after validate
> is a subset of before validate.

I was thinking about (more-or-less hypothetical) interdependencies (see
above). But that's not terribly important.

>
>
>> 
>> >  
>> >> 
>> >> I'm not sure whether that is even a problem in the spec: while the
>> >> driver may read the config before finally accepting features  
>> >
>> > I'm not sure I'm following you. Let me please qoute the specification:
>> > """
>> > 4. Read device feature bits, and write the subset of feature bits
>> > understood by the OS and driver to the device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it. 
>> > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step. 
>> > """
>> > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-930001  
>> 
>> Yes, exactly, it MAY read before accepting features. How does the device
>> know whether the config space is little-endian or not?
>> 
>
> Well that is what we are talking about. One can try to infer things from
> the spec. This reset dance I called ugly is probably the cleanest,
> because the spec says that re-nego should work.
>
>> >  
>> >> , it does
>> >> not really make sense to do so before a feature bit as basic as
>> >> VERSION_1 which determines the endianness has been negotiated.   
>> >
>> > Are you suggesting that ->verify() should be after
>> > virtio_finalize_features()?  
>> 
>> No, that would defeat the entire purpose of verify. After
>> virtio_finalize_features(), we are done with feature negotiation.
>>
>
> Exactly!

It seems we are in violent agreement :)

>  
>> > Wouldn't
>> > that mean that verify() can't reject feature bits? But that is the whole
>> > point of commit 82e89ea077b9 ("virtio-blk: Add validation for block size
>> > in config space"). Do you think that the commit in question is
>> > conceptually flawed? My understanding of the verify is, that it is supposed
>> > to fence features and feature bits we can't support, e.g. because of
>> > config space things, but I may be wrong.  
>> 
>> No, that commit is not really flawed on its own, I think the whole
>> procedure may be problematic.
>> 
>
> I agree! But that regression really hurts us. Maybe the best band-aid is
> to conditional-compile it (not compile the check if s390).

It's probably most likely to hit on s390 (big-endian, and devices with a
blocksize != 512 in common use); but I'd like to make that band-aid more
generic than "exclude for s390". A hack for honouring VERSION_1 before
negotiation has finished is probably better as a stop-gap before we
manage to figure out how to deal with this properly.

>
>> >
>> > The trouble is, feature bits are not negotiated one by one, but basically all
>> > at once. I suppose, I did the next best thing to first negotiating
>> > VERSION_1.  
>> 
>> We probably need to special-case VERSION_1 to move at least forward;
>> i.e. proceed as if we accepted it when reading the config space.
>> 
>> The problem is that we do not know what the device assumes when we read
>> the config space prior to setting FEATURES_OK. It may assume
>> little-endian if it offered VERSION_1, or it may not. The spec does not
>> really say what happens before feature negotiation has finished.
>> 
> No it does not, but I hope, the implementations we care the most about do
> little endian if VERSION_1 is set but FEATURES_OK is not yet done. A
> transitional device would have to act upon a feature that is set,
> because for legacy there is no FEATURES_OK. Where we can run into
> trouble is minimum required feature set, e.g. mandatory features.

All ugly :(

>
> I will do some testing.
>
>> >
>> >  
>> >> For
>> >> VERSION_1, we can probably go ahead and just assume that we will accept
>> >> it if offered, but what about other (future) bits?  
>> >
>> > I don't quite understand.  
>> 
>> There might be other bits in the future that change how the config space
>> works. We cannot assume that any of those bits will be accepted if
>> offered; i.e. we need a special hack for VERSION_1.
>
> I tend to agree. What I didn't consider in this patch is that, setting
> bits does not only set bits, but may also change the device in a way,
> that clearing the bit would not change it back.
>
>> 
>> >
>> > Anyway, how do you think we should solve this problem?  
>> 
>> This is a mess. For starters, we need to think about if we should do
>> something in the spec, and if yes, what.. Then, we can probably think
>> about how to implement that properly.
>>
>
> I agree.
>
>  
>> As we have an error right now that is basically a regression, we
>> probably need a band-aid to keep going. Not sure if your patch is the
>> right approach, maybe we really need to special-case VERSION_1 (the
>> "assume we accepted it" hack mentioned above.) This will likely fix the
>> reported problem (I assume that is s390x on QEMU); do we know about
>> other VMMs? Any other big-endian architectures?
>
> I didn't quite get it. Would this hack take place in QEMU or in the guest
> kernel?

I'd say we need a hack here so that we assume little-endian config space
if VERSION_1 has been offered; if your patch here works, I assume QEMU
does what we expect (assmuming little-endian as well.) I'm mostly
wondering what happens if you use a different VMM; can we expect it to
work similar to QEMU? Even if it helps for s390, we should double-check
what happens for other architectures.

>
>> 
>> Anyone have any better suggestions?
>> 
>
> There is the conditional compile, as an option but I would not say it is
> better.

Yes, I agree.

Anyone else have an idea? This is a nasty regression; we could revert the
patch, which would remove the symptoms and give us some time, but that
doesn't really feel right, I'd do that only as a last resort.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-01  7:21   ` Halil Pasic
@ 2021-10-02 10:21     ` Michael S. Tsirkin
  2021-10-04 12:19       ` Cornelia Huck
  0 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-10-02 10:21 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Jason Wang, Xie Yongji, virtualization, linux-kernel, markver,
	Cornelia Huck, Christian Borntraeger, linux-s390

On Fri, Oct 01, 2021 at 09:21:25AM +0200, Halil Pasic wrote:
> On Thu, 30 Sep 2021 07:12:21 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Thu, Sep 30, 2021 at 03:20:49AM +0200, Halil Pasic wrote:
> > > This patch fixes a regression introduced by commit 82e89ea077b9
> > > ("virtio-blk: Add validation for block size in config space") and
> > > enables similar checks in verify() on big endian platforms.
> > > 
> > > The problem with checking multi-byte config fields in the verify
> > > callback, on big endian platforms, and with a possibly transitional
> > > device is the following. The verify() callback is called between
> > > config->get_features() and virtio_finalize_features(). That we have a
> > > device that offered F_VERSION_1 then we have the following options
> > > either the device is transitional, and then it has to present the legacy
> > > interface, i.e. a big endian config space until F_VERSION_1 is
> > > negotiated, or we have a non-transitional device, which makes
> > > F_VERSION_1 mandatory, and only implements the non-legacy interface and
> > > thus presents a little endian config space. Because at this point we
> > > can't know if the device is transitional or non-transitional, we can't
> > > know do we need to byte swap or not.  
> > 
> > Hmm which transport does this refer to?
> 
> It is the same with virtio-ccw and virtio-pci. I see the same problem
> with both on s390x. I didn't try with virtio-blk-pci-non-transitional
> yet (have to figure out how to do that with libvirt) for pci I used
> virtio-blk-pci.
> 
> > Distinguishing between legacy and modern drivers is transport
> > specific.  PCI presents
> > legacy and modern at separate addresses so distinguishing
> > between these two should be no trouble.
> 
> You mean the device id? Yes that is bolted down in the spec, but
> currently we don't exploit that information. Furthermore there
> is a fat chance that with QEMU even the allegedly non-transitional
> devices only present a little endian config space after VERSION_1
> was negotiated. Namely get_config for virtio-blk is implemented in
> virtio_blk_update_config() which does virtio_stl_p(vdev,
> &blkcfg.blk_size, blk_size) and in there we don't care
> about transitional or not:
> 
> static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
> {
> #if defined(LEGACY_VIRTIO_IS_BIENDIAN)
>     return virtio_is_big_endian(vdev);
> #elif defined(TARGET_WORDS_BIGENDIAN)
>     if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
>         /* Devices conforming to VIRTIO 1.0 or later are always LE. */
>         return false;
>     }
>     return true;
> #else
>     return false;
> #endif
> }
> 

ok so that's a QEMU bug. Any virtio 1.0 and up
compatible device must use LE.
It can also present a legacy config space where the
endian depends on the guest.

> > Channel i/o has versioning so same thing?
> >
> 
> Don't think so. Both a transitional and a non-transitional device
> would have to accept revisions higher than 0 if the driver tried to
> negotiate those (and we do in our case).

Yes, the modern driver does. And that one is known to be LE.
legacy driver doesn't.

> > > The virtio spec explicitly states that the driver MAY read config
> > > between reading and writing the features so saying that first accessing
> > > the config before feature negotiation is done is not an option. The
> > > specification ain't clear about setting the features multiple times
> > > before FEATURES_OK, so I guess that should be fine.
> > > 
> > > I don't consider this patch super clean, but frankly I don't think we
> > > have a ton of options. Another option that may or man not be cleaner,
> > > but is also IMHO much uglier is to figure out whether the device is
> > > transitional by rejecting _F_VERSION_1, then resetting it and proceeding
> > > according tho what we have figured out, hoping that the characteristics
> > > of the device didn't change.  
> > 
> > I am confused here. So is the problem at the device or at the driver level?
> 
> We have a driver regression. Since the 82e89ea077b9 ("virtio-blk: Add
> validation for block size in config space") virtio-blk is broken on
> s390.

Because of a qemu bug. I agree. It's worth working around in the driver
since the qemu bug has been around for a very long time.


> The deeper problem is in the spec. We stated that the driver may read
> config space before the feature negotiation is finalized, but we didn't
> think enough about what happens when native endiannes is not little
> endian in the different cases.

Because the spec is very clear that endian-ness is LE.
I don't see a spec issue yet here, just an implementation issue.

> I believe, for non-transitional devices we have a problem in the host as
> well (i.e. in QEMU).

Because QEMU ignores the spec and instead relies on the feature
negotiation.

> 
> > I suspect it's actually the host that has the issue, not
> > the guest?
> 
> I tend to say we have a problem both in the host and in the guest. I'm
> more concerned about the problem in the guest, because that is a really
> nasty regression.

The problem is in the guest. The bug is in the host ;)

> For the host. I think for legacy we don't have a
> problem, because both sides would operate on the assumption no
> _F_VERSION_1, IMHO the implementation for the transitional devices is
> correct.

Well no, the point of transitional is really to be 1.0 compliant
*and* also expose a legacy interface.

> For non-transitional flavor, it depends on the device. For
> example virtio-net and virtio-blk is broken, because we use primitives
> like virtio_stl_p() and those don't do the right thing before feature
> negotiation is completed. On the other hand virtio-crypto.c as a truly
> non-transitional device uses stl_le_p() and IMHO does the right thing.
> 
> Thanks for your comments! I hope I managed to answer your questions. I
> need some guidance on how do we want to move forward on this.
> 
> Regards,
> Halil

OK so. I don't have a problem with the patch itself,
assuming it's enough to work around all buggy hosts.
I am especially worried about things like vhost/vhost-user,
I suspect they might have a bug like this too, and
I am not sure whether your work around is enough for these.
Can you check please?

If not we'll have to move all validate code to after FEATURES_OK
is set.

We do however want to document that this API can be called
multiple times since that was not the case
previously.

Also, I would limit this to when
- the validate callback exists
- the guest endian-ness is not LE

We also want to document the QEMU bug in a comment here,
e.g. 

/*
 * QEMU before version 6.2 incorrectly uses driver features with guest
 * endian-ness to set endian-ness for config space instead of just using
 * LE for the modern interface as per spec.
 * This breaks reading config in the validate callback.
 * To work around that, when device is 1.0 (so supposed to be LE)
 * but guest is not LE, then send the features to device one extra
 * time before validation.
 */

Finally I'd like to see the QEMU bug fix before I merge this one,
since it will be harder to test with a fix.




> > 
> > 
> > > Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> > > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
> > > Reported-by: markver@us.ibm.com
> > > ---
> > >  drivers/virtio/virtio.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > index 0a5b54034d4b..9dc3cfa17b1c 100644
> > > --- a/drivers/virtio/virtio.c
> > > +++ b/drivers/virtio/virtio.c
> > > @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
> > >  		if (device_features & (1ULL << i))
> > >  			__virtio_set_bit(dev, i);
> > >  
> > > +	/* Write back features before validate to know endianness */
> > > +	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
> > > +		dev->config->finalize_features(dev);
> > > +
> > >  	if (drv->validate) {
> > >  		err = drv->validate(dev);
> > >  		if (err)
> > > 
> > > base-commit: 02d5e016800d082058b3d3b7c3ede136cdc6ddcb
> > > -- 
> > > 2.25.1  
> > 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-09-30 11:31     ` Cornelia Huck
  2021-10-01 14:22       ` Halil Pasic
@ 2021-10-02 12:09       ` Michael S. Tsirkin
  1 sibling, 0 replies; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-10-02 12:09 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390

On Thu, Sep 30, 2021 at 01:31:04PM +0200, Cornelia Huck wrote:
> On Thu, Sep 30 2021, Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Thu, 30 Sep 2021 11:28:23 +0200
> > Cornelia Huck <cohuck@redhat.com> wrote:
> >
> >> On Thu, Sep 30 2021, Halil Pasic <pasic@linux.ibm.com> wrote:
> >> 
> >> > This patch fixes a regression introduced by commit 82e89ea077b9
> >> > ("virtio-blk: Add validation for block size in config space") and
> >> > enables similar checks in verify() on big endian platforms.
> >> >
> >> > The problem with checking multi-byte config fields in the verify
> >> > callback, on big endian platforms, and with a possibly transitional
> >> > device is the following. The verify() callback is called between
> >> > config->get_features() and virtio_finalize_features(). That we have a
> >> > device that offered F_VERSION_1 then we have the following options
> >> > either the device is transitional, and then it has to present the legacy
> >> > interface, i.e. a big endian config space until F_VERSION_1 is
> >> > negotiated, or we have a non-transitional device, which makes
> >> > F_VERSION_1 mandatory, and only implements the non-legacy interface and
> >> > thus presents a little endian config space. Because at this point we
> >> > can't know if the device is transitional or non-transitional, we can't
> >> > know do we need to byte swap or not.
> >> >
> >> > The virtio spec explicitly states that the driver MAY read config
> >> > between reading and writing the features so saying that first accessing
> >> > the config before feature negotiation is done is not an option. The
> >> > specification ain't clear about setting the features multiple times
> >> > before FEATURES_OK, so I guess that should be fine.
> >> >
> >> > I don't consider this patch super clean, but frankly I don't think we
> >> > have a ton of options. Another option that may or man not be cleaner,
> >> > but is also IMHO much uglier is to figure out whether the device is
> >> > transitional by rejecting _F_VERSION_1, then resetting it and proceeding
> >> > according tho what we have figured out, hoping that the characteristics
> >> > of the device didn't change.
> >> >
> >> > Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> >> > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
> >> > Reported-by: markver@us.ibm.com
> >> > ---
> >> >  drivers/virtio/virtio.c | 4 ++++
> >> >  1 file changed, 4 insertions(+)
> >> >
> >> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> >> > index 0a5b54034d4b..9dc3cfa17b1c 100644
> >> > --- a/drivers/virtio/virtio.c
> >> > +++ b/drivers/virtio/virtio.c
> >> > @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
> >> >  		if (device_features & (1ULL << i))
> >> >  			__virtio_set_bit(dev, i);
> >> >  
> >> > +	/* Write back features before validate to know endianness */
> >> > +	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
> >> > +		dev->config->finalize_features(dev);  
> >> 
> >> This really looks like a mess :(
> >> 
> >> We end up calling ->finalize_features twice: once before ->validate, and
> >> once after, that time with the complete song and dance. The first time,
> >> we operate on one feature set; after validation, we operate on another,
> >> and there might be interdependencies between the two (like a that a bit
> >> is cleared because of another bit, which would not happen if validate
> >> had a chance to clear that bit before).
> >
> > Basically the second set is a subset of the first set.
> 
> I don't think that's clear.
> 
> >
> >> 
> >> I'm not sure whether that is even a problem in the spec: while the
> >> driver may read the config before finally accepting features
> >
> > I'm not sure I'm following you. Let me please qoute the specification:
> > """
> > 4. Read device feature bits, and write the subset of feature bits
> > understood by the OS and driver to the device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it. 
> > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step. 
> > """
> > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-930001
> 
> Yes, exactly, it MAY read before accepting features. How does the device
> know whether the config space is little-endian or not?

I think it knows simply because the spec says it's little-endian.



> >
> >> , it does
> >> not really make sense to do so before a feature bit as basic as
> >> VERSION_1 which determines the endianness has been negotiated. 
> >
> > Are you suggesting that ->verify() should be after
> > virtio_finalize_features()?
> 
> No, that would defeat the entire purpose of verify. After
> virtio_finalize_features(), we are done with feature negotiation.
> 
> > Wouldn't
> > that mean that verify() can't reject feature bits? But that is the whole
> > point of commit 82e89ea077b9 ("virtio-blk: Add validation for block size
> > in config space"). Do you think that the commit in question is
> > conceptually flawed? My understanding of the verify is, that it is supposed
> > to fence features and feature bits we can't support, e.g. because of
> > config space things, but I may be wrong.
> 
> No, that commit is not really flawed on its own, I think the whole
> procedure may be problematic.
> 
> >
> > The trouble is, feature bits are not negotiated one by one, but basically all
> > at once. I suppose, I did the next best thing to first negotiating
> > VERSION_1.
> 
> We probably need to special-case VERSION_1 to move at least forward;
> i.e. proceed as if we accepted it when reading the config space.
> 
> The problem is that we do not know what the device assumes when we read
> the config space prior to setting FEATURES_OK. It may assume
> little-endian if it offered VERSION_1, or it may not. The spec does not
> really say what happens before feature negotiation has finished.


So if your device is non transitional then it's LE.
If it's transitional it exposes a legacy interface
in addition to the modern one, and that one is guest endian.
How does device know which interface is used?
E.g. for PCI it's a separate address range.

For ccw why not check the revision? legacy drivers use 0 for that.


> >
> >
> >> For
> >> VERSION_1, we can probably go ahead and just assume that we will accept
> >> it if offered, but what about other (future) bits?
> >
> > I don't quite understand.
> 
> There might be other bits in the future that change how the config space
> works. We cannot assume that any of those bits will be accepted if
> offered; i.e. we need a special hack for VERSION_1.
> 
> >
> > Anyway, how do you think we should solve this problem?
> 
> This is a mess. For starters, we need to think about if we should do
> something in the spec, and if yes, what.. Then, we can probably think
> about how to implement that properly.
> 
> As we have an error right now that is basically a regression, we
> probably need a band-aid to keep going. Not sure if your patch is the
> right approach, maybe we really need to special-case VERSION_1 (the
> "assume we accepted it" hack mentioned above.) This will likely fix the
> reported problem (I assume that is s390x on QEMU); do we know about
> other VMMs? Any other big-endian architectures?
> 
> Anyone have any better suggestions?


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-01 15:18         ` Cornelia Huck
@ 2021-10-02 18:13           ` Michael S. Tsirkin
  2021-10-04  2:23             ` Halil Pasic
  2021-10-04  7:01             ` Cornelia Huck
  0 siblings, 2 replies; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-10-02 18:13 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390

On Fri, Oct 01, 2021 at 05:18:46PM +0200, Cornelia Huck wrote:
> I'd say we need a hack here so that we assume little-endian config space
> if VERSION_1 has been offered; if your patch here works, I assume QEMU
> does what we expect (assmuming little-endian as well.) I'm mostly
> wondering what happens if you use a different VMM; can we expect it to
> work similar to QEMU?

Hard to say of course ... hopefully other VMMs are actually
implementing the spec. E.g. IIUC rust vmm is modern only.


> Even if it helps for s390, we should double-check
> what happens for other architectures.
> 
> >
> >> 
> >> Anyone have any better suggestions?
> >> 
> >
> > There is the conditional compile, as an option but I would not say it is
> > better.
> 
> Yes, I agree.
> 
> Anyone else have an idea? This is a nasty regression; we could revert the
> patch, which would remove the symptoms and give us some time, but that
> doesn't really feel right, I'd do that only as a last resort.

Well we have Halil's hack (except I would limit it
to only apply to BE, only do devices with validate,
and only in modern mode), and we will fix QEMU to be spec compliant.
Between these why do we need any conditional compiles?

-- 
MST


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-09-30 11:36   ` Cornelia Huck
@ 2021-10-02 18:20     ` Michael S. Tsirkin
  2021-10-03  5:00       ` Halil Pasic
  0 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-10-02 18:20 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390

On Thu, Sep 30, 2021 at 01:36:27PM +0200, Cornelia Huck wrote:
> On Thu, Sep 30 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Thu, Sep 30, 2021 at 03:20:49AM +0200, Halil Pasic wrote:
> >> This patch fixes a regression introduced by commit 82e89ea077b9
> >> ("virtio-blk: Add validation for block size in config space") and
> >> enables similar checks in verify() on big endian platforms.
> >> 
> >> The problem with checking multi-byte config fields in the verify
> >> callback, on big endian platforms, and with a possibly transitional
> >> device is the following. The verify() callback is called between
> >> config->get_features() and virtio_finalize_features(). That we have a
> >> device that offered F_VERSION_1 then we have the following options
> >> either the device is transitional, and then it has to present the legacy
> >> interface, i.e. a big endian config space until F_VERSION_1 is
> >> negotiated, or we have a non-transitional device, which makes
> >> F_VERSION_1 mandatory, and only implements the non-legacy interface and
> >> thus presents a little endian config space. Because at this point we
> >> can't know if the device is transitional or non-transitional, we can't
> >> know do we need to byte swap or not.
> >
> > Hmm which transport does this refer to?
> > Distinguishing between legacy and modern drivers is transport
> > specific.  PCI presents
> > legacy and modern at separate addresses so distinguishing
> > between these two should be no trouble.
> 
> Hm, what about transitional devices?

transitional devices can be accessed through a modern
or a legacy interface, not both. Device knows how
it's accessed. It should key endian-ness decisions on
this not on feature negotiation.

> > Channel i/o has versioning so same thing?
> 
> It can turn off VERSION_1, but not legacy. (I had hacked up a patchset
> to potentially disable legacy some time ago, but did not have any
> resources to follow up on this.)

That's ok, my point is that revision is negotiated before config
accesses, IIUC a legacy driver expecting BE will use revision 0, modern
one will use revision 1 and up.

> 
> >
> >> The virtio spec explicitly states that the driver MAY read config
> >> between reading and writing the features so saying that first accessing
> >> the config before feature negotiation is done is not an option. The
> >> specification ain't clear about setting the features multiple times
> >> before FEATURES_OK, so I guess that should be fine.
> >> 
> >> I don't consider this patch super clean, but frankly I don't think we
> >> have a ton of options. Another option that may or man not be cleaner,
> >> but is also IMHO much uglier is to figure out whether the device is
> >> transitional by rejecting _F_VERSION_1, then resetting it and proceeding
> >> according tho what we have figured out, hoping that the characteristics
> >> of the device didn't change.
> >
> > I am confused here. So is the problem at the device or at the driver level?
> > I suspect it's actually the host that has the issue, not
> > the guest?
> 
> >From my perspective the problem is that the version of the device
> remains in limbo as long as the features have not yet been finalized,
> which means that the endianness of the config space remains in limbo as
> well. Both device and driver might come to different conclusions.

Version === legacy versus modern?
It is true that feature negotiation can not be used by device to decide that
question simply because it happens too late.
So let's not use it for that then ;)

Yes we have VERSION_1 which looks like it should allow this, but
unfortunately it only helps with that for the driver, not the device.

In practice legacy versus modern has to be determined by
transport specific versioning, luckily we have that for all
specified transports (can't say what happens with rproc).

> 
> >
> >
> >> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> >> Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
> >> Reported-by: markver@us.ibm.com
> >> ---
> >>  drivers/virtio/virtio.c | 4 ++++
> >>  1 file changed, 4 insertions(+)
> >> 
> >> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> >> index 0a5b54034d4b..9dc3cfa17b1c 100644
> >> --- a/drivers/virtio/virtio.c
> >> +++ b/drivers/virtio/virtio.c
> >> @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
> >>  		if (device_features & (1ULL << i))
> >>  			__virtio_set_bit(dev, i);
> >>  
> >> +	/* Write back features before validate to know endianness */
> >> +	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
> >> +		dev->config->finalize_features(dev);
> >> +
> >>  	if (drv->validate) {
> >>  		err = drv->validate(dev);
> >>  		if (err)
> >> 
> >> base-commit: 02d5e016800d082058b3d3b7c3ede136cdc6ddcb
> >> -- 
> >> 2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-02 18:20     ` Michael S. Tsirkin
@ 2021-10-03  5:00       ` Halil Pasic
  2021-10-03  6:42         ` Michael S. Tsirkin
  0 siblings, 1 reply; 52+ messages in thread
From: Halil Pasic @ 2021-10-03  5:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Cornelia Huck, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	Halil Pasic

On Sat, 2 Oct 2021 14:20:47 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> > >From my perspective the problem is that the version of the device  
> > remains in limbo as long as the features have not yet been finalized,
> > which means that the endianness of the config space remains in limbo as
> > well. Both device and driver might come to different conclusions.  
> 
> Version === legacy versus modern?
> It is true that feature negotiation can not be used by device to decide that
> question simply because it happens too late.
> So let's not use it for that then ;)
> 
> Yes we have VERSION_1 which looks like it should allow this, but
> unfortunately it only helps with that for the driver, not the device.
> 
> In practice legacy versus modern has to be determined by
> transport specific versioning, luckily we have that for all
> specified transports (can't say what happens with rproc).

So if we look at ccw, you say that the revision negotiation already
determines whether VERSION_1 is negotiated or not, and the
feature bit VERSION_1 is superfluous?

That would also imply, that 
1) if revision > 0 was negotiated then the device must offer VERSION_1
2) if revision > 0 was negotiated and the driver cleared VERSION_1
   the device must refuse to operate.
3) if revision > 0 was negotiated then the driver should reject 
   to drive a device if it does not offer VERSION_1
4) if revision > 0 was negotiated the driver must accept VERSION_1
5) if revision > 0 was *not* negotiated then the device should not offer
   VERSION_1 because at this point it is already certain that the device
   can not act in accordance to the virtio 1.0 or higher interface.

Does that sound about right?

IMHO we should also change 
https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-160003
and the definition of VERSION_1 because both sides have to know what is
going on before features are fully negotiated. Or?

Regards,
Halil




^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-03  5:00       ` Halil Pasic
@ 2021-10-03  6:42         ` Michael S. Tsirkin
  2021-10-03  7:26           ` Michael S. Tsirkin
  0 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-10-03  6:42 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Cornelia Huck, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390

On Sun, Oct 03, 2021 at 07:00:30AM +0200, Halil Pasic wrote:
> On Sat, 2 Oct 2021 14:20:47 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > > >From my perspective the problem is that the version of the device  
> > > remains in limbo as long as the features have not yet been finalized,
> > > which means that the endianness of the config space remains in limbo as
> > > well. Both device and driver might come to different conclusions.  
> > 
> > Version === legacy versus modern?
> > It is true that feature negotiation can not be used by device to decide that
> > question simply because it happens too late.
> > So let's not use it for that then ;)
> > 
> > Yes we have VERSION_1 which looks like it should allow this, but
> > unfortunately it only helps with that for the driver, not the device.
> > 
> > In practice legacy versus modern has to be determined by
> > transport specific versioning, luckily we have that for all
> > specified transports (can't say what happens with rproc).
> 
> So if we look at ccw, you say that the revision negotiation already
> determines whether VERSION_1 is negotiated or not, and the
> feature bit VERSION_1 is superfluous?
> 
> That would also imply, that 
> 1) if revision > 0 was negotiated then the device must offer VERSION_1
> 2) if revision > 0 was negotiated and the driver cleared VERSION_1
>    the device must refuse to operate.
> 3) if revision > 0 was negotiated then the driver should reject 
>    to drive a device if it does not offer VERSION_1
> 4) if revision > 0 was negotiated the driver must accept VERSION_1
> 5) if revision > 0 was *not* negotiated then the device should not offer
>    VERSION_1 because at this point it is already certain that the device
>    can not act in accordance to the virtio 1.0 or higher interface.
> 
> Does that sound about right?

To me, it does.

> IMHO we should also change 
> https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-160003
> and the definition of VERSION_1 because both sides have to know what is
> going on before features are fully negotiated. Or?
> 
> Regards,
> Halil
> 

I guess so. And I guess we need transport-specific sections
describing this behaviour for each transport.

So something like this, for starters?

diff --git a/content.tex b/content.tex
index 1398390..c526dd3 100644
--- a/content.tex
+++ b/content.tex
@@ -140,10 +140,13 @@ \subsection{Legacy Interface: A Note on Feature
 Bits}\label{sec:Basic Facilities of a Virtio Device / Feature
 Bits / Legacy Interface: A Note on Feature Bits}
 
-Transitional Drivers MUST detect Legacy Devices by detecting that
-the feature bit VIRTIO_F_VERSION_1 is not offered.
-Transitional devices MUST detect Legacy drivers by detecting that
-VIRTIO_F_VERSION_1 has not been acknowledged by the driver.
+Transitional drivers MAY support operating legacy devices.
+Transitional devices MAY support operation by legacy drivers.
+
+Transitional drivers MUST detect legacy devices in a way that is
+transport specific.
+Transitional devices MUST detect legacy drivers in a way that
+is transport specific.
 
 In this case device is used through the legacy interface.
 
@@ -160,6 +163,25 @@ \subsection{Legacy Interface: A Note on Feature
 Specification text within these sections generally does not apply
 to non-transitional devices.
 
+\begin{note}
+The device offers different features when used through
+the legacy interface and when operated in accordance with this
+specification.
+\end{note}
+
+Transitional drivers MUST use Devices only through the legacy interface
+if the feature bit VIRTIO_F_VERSION_1 is not offered.
+Transitional devices MUST NOT offer VIRTIO_F_VERSION_1 when used through
+the legacy interface.
+
+When the driver uses a device through the legacy interface, then it
+MUST only accept the features the device offered through the
+legacy interface.
+
+When used through the legacy interface, the device SHOULD
+validate that the driver only accepted the features it
+offered through the legacy interface.
+
 \section{Notifications}\label{sec:Basic Facilities of a Virtio Device
 / Notifications}
 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-03  6:42         ` Michael S. Tsirkin
@ 2021-10-03  7:26           ` Michael S. Tsirkin
  2021-10-04 12:01             ` Cornelia Huck
  0 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-10-03  7:26 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Cornelia Huck, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	virtio-dev

On Sun, Oct 03, 2021 at 02:42:30AM -0400, Michael S. Tsirkin wrote:
> On Sun, Oct 03, 2021 at 07:00:30AM +0200, Halil Pasic wrote:
> > On Sat, 2 Oct 2021 14:20:47 -0400
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > > > >From my perspective the problem is that the version of the device  
> > > > remains in limbo as long as the features have not yet been finalized,
> > > > which means that the endianness of the config space remains in limbo as
> > > > well. Both device and driver might come to different conclusions.  
> > > 
> > > Version === legacy versus modern?
> > > It is true that feature negotiation can not be used by device to decide that
> > > question simply because it happens too late.
> > > So let's not use it for that then ;)
> > > 
> > > Yes we have VERSION_1 which looks like it should allow this, but
> > > unfortunately it only helps with that for the driver, not the device.
> > > 
> > > In practice legacy versus modern has to be determined by
> > > transport specific versioning, luckily we have that for all
> > > specified transports (can't say what happens with rproc).
> > 
> > So if we look at ccw, you say that the revision negotiation already
> > determines whether VERSION_1 is negotiated or not, and the
> > feature bit VERSION_1 is superfluous?
> > 
> > That would also imply, that 
> > 1) if revision > 0 was negotiated then the device must offer VERSION_1
> > 2) if revision > 0 was negotiated and the driver cleared VERSION_1
> >    the device must refuse to operate.
> > 3) if revision > 0 was negotiated then the driver should reject 
> >    to drive a device if it does not offer VERSION_1
> > 4) if revision > 0 was negotiated the driver must accept VERSION_1
> > 5) if revision > 0 was *not* negotiated then the device should not offer
> >    VERSION_1 because at this point it is already certain that the device
> >    can not act in accordance to the virtio 1.0 or higher interface.
> > 
> > Does that sound about right?
> 
> To me, it does.
> 
> > IMHO we should also change 
> > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-160003
> > and the definition of VERSION_1 because both sides have to know what is
> > going on before features are fully negotiated. Or?
> > 
> > Regards,
> > Halil
> > 
> 
> I guess so. And I guess we need transport-specific sections
> describing this behaviour for each transport.
> 
> So something like this, for starters?

Sent too early. So here's what I propose. Could you pls take a look
and if you like this, post a ccw section?
There's also an attempt to prevent fallback from modern to legacy
here since if driver does fallback then failing FEATURES_OK can't work
properly.
That's a separate issue, will be a separate patch when I post
this for consideration by the TC.


diff --git a/content.tex b/content.tex
index 1398390..06271f4 100644
--- a/content.tex
+++ b/content.tex
@@ -140,10 +140,13 @@ \subsection{Legacy Interface: A Note on Feature
 Bits}\label{sec:Basic Facilities of a Virtio Device / Feature
 Bits / Legacy Interface: A Note on Feature Bits}
 
-Transitional Drivers MUST detect Legacy Devices by detecting that
-the feature bit VIRTIO_F_VERSION_1 is not offered.
-Transitional devices MUST detect Legacy drivers by detecting that
-VIRTIO_F_VERSION_1 has not been acknowledged by the driver.
+Transitional drivers MAY support operating legacy devices.
+Transitional devices MAY support operation by legacy drivers.
+
+Transitional drivers MUST detect legacy devices in a way that is
+transport specific.
+Transitional devices MUST detect legacy drivers in a way that
+is transport specific.
 
 In this case device is used through the legacy interface.
 
@@ -160,6 +163,33 @@ \subsection{Legacy Interface: A Note on Feature
 Specification text within these sections generally does not apply
 to non-transitional devices.
 
+\begin{note}
+The device offers different features when used through
+the legacy interface and when operated in accordance with this
+specification.
+\end{note}
+
+Transitional drivers MUST use Devices only through the legacy interface
+if the feature bit VIRTIO_F_VERSION_1 is not offered.
+Transitional devices MUST NOT offer VIRTIO_F_VERSION_1 when used through
+the legacy interface.
+
+When the driver uses a device through the legacy interface, then it
+MUST only accept the features the device offered through the
+legacy interface.
+
+When used through the legacy interface, the device SHOULD
+validate that the driver only accepted the features it
+offered through the legacy interface.
+
+When operating a transitional device, a transitional driver
+SHOULD NOT use the device through the legacy interface if
+operation through the modern interface has failed.
+In particular, a transitional driver
+SHOULD NOT fall back to using the device through the
+legacy interface if feature negotiation failed
+(since that would defeat the purpose of the FEATURES_OK bit).
+
 \section{Notifications}\label{sec:Basic Facilities of a Virtio Device
 / Notifications}
 
@@ -1003,6 +1033,12 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
 
 The driver MUST NOT write a 0 to \field{queue_enable}.
 
+\paragraph}{Legacy Interface: Common configuration structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Legacy Interface: Common configuration structure layout}
+Transitional drivers SHOULD detect legacy devices by detecting
+that the device has the Transitional PCI Device ID in
+the range 0x1000 to 0x103f and lacks a VIRTIO_PCI_CAP_COMMON_CFG
+capability specifying the location of a common configuration structure.
+
 \subsubsection{Notification structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
 
 The notification location is found using the VIRTIO_PCI_CAP_NOTIFY_CFG
@@ -1288,6 +1324,10 @@ \subsubsection{Legacy Interfaces: A Note on PCI Device Layout}\label{sec:Virtio
 Transitional devices MUST present part of configuration
 registers in a legacy configuration structure in BAR0 in the first I/O
 region of the PCI device, as documented below.
+
+Transitional devices SHOULD detect legacy drivers by detecting
+access to the legacy configuration structure.
+
 When using the legacy interface, transitional drivers
 MUST use the legacy configuration structure in BAR0 in the first
 I/O region of the PCI device, as documented below.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-02 18:13           ` Michael S. Tsirkin
@ 2021-10-04  2:23             ` Halil Pasic
  2021-10-04  9:07               ` Michael S. Tsirkin
  2021-10-04  7:01             ` Cornelia Huck
  1 sibling, 1 reply; 52+ messages in thread
From: Halil Pasic @ 2021-10-04  2:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Cornelia Huck, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	Halil Pasic

On Sat, 2 Oct 2021 14:13:37 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> > Anyone else have an idea? This is a nasty regression; we could revert the
> > patch, which would remove the symptoms and give us some time, but that
> > doesn't really feel right, I'd do that only as a last resort.  
> 
> Well we have Halil's hack (except I would limit it
> to only apply to BE, only do devices with validate,
> and only in modern mode), and we will fix QEMU to be spec compliant.
> Between these why do we need any conditional compiles?

We don't. As I stated before, this hack is flawed because it
effectively breaks fencing features by the driver with QEMU. Some
features can not be unset after once set, because we tend to try to
enable the corresponding functionality whenever we see a write
features operation with the feature bit set, and we don't disable, if a
subsequent features write operation stores the feature bit as not set.
But it looks like VIRTIO_1 is fine to get cleared afterwards. So my hack
should actually look like posted below, modulo conditions.

Regarding the conditions I guess checking that driver_features has
F_VERSION_1 already satisfies "only modern mode", or? For now
I've deliberately omitted the has verify and the is big endian
conditions so we have a better chance to see if something breaks
(i.e. the approach does not work). I can add in those extra conditions
later.

--------------------------8<---------------------

From: Halil Pasic <pasic@linux.ibm.com>
Date: Thu, 30 Sep 2021 02:38:47 +0200
Subject: [PATCH] virtio: write back feature VERSION_1 before verify

This patch fixes a regression introduced by commit 82e89ea077b9
("virtio-blk: Add validation for block size in config space") and
enables similar checks in verify() on big endian platforms.

The problem with checking multi-byte config fields in the verify
callback, on big endian platforms, and with a possibly transitional
device is the following. The verify() callback is called between
config->get_features() and virtio_finalize_features(). That we have a
device that offered F_VERSION_1 then we have the following options
either the device is transitional, and then it has to present the legacy
interface, i.e. a big endian config space until F_VERSION_1 is
negotiated, or we have a non-transitional device, which makes
F_VERSION_1 mandatory, and only implements the non-legacy interface and
thus presents a little endian config space. Because at this point we
can't know if the device is transitional or non-transitional, we can't
know do we need to byte swap or not.

The virtio spec explicitly states that the driver MAY read config
between reading and writing the features so saying that first accessing
the config before feature negotiation is done is not an option. The
specification ain't clear about setting the features multiple times
before FEATURES_OK, so I guess that should be fine to set F_VERSION_1
since at this point we already know that we are about to negotiate
F_VERSION_1.

I don't consider this patch super clean, but frankly I don't think we
have a ton of options. Another option that may or man not be cleaner,
but is also IMHO much uglier is to figure out whether the device is
transitional by rejecting _F_VERSION_1, then resetting it and proceeding
according tho what we have figured out, hoping that the characteristics
of the device didn't change.

Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
Reported-by: markver@us.ibm.com
---
 drivers/virtio/virtio.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 0a5b54034d4b..2b9358f2e22a 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -239,6 +239,12 @@ static int virtio_dev_probe(struct device *_d)
 		driver_features_legacy = driver_features;
 	}
 
+	/* Write F_VERSION_1 feature to pin down endianness */
+	if (device_features & (1ULL << VIRTIO_F_VERSION_1) & driver_features) {
+		dev->features = (1ULL << VIRTIO_F_VERSION_1);
+		dev->config->finalize_features(dev);
+	}
+
 	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
 		dev->features = driver_features & device_features;
 	else
-- 
2.31.1





 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-02 18:13           ` Michael S. Tsirkin
  2021-10-04  2:23             ` Halil Pasic
@ 2021-10-04  7:01             ` Cornelia Huck
  2021-10-04  9:25               ` Halil Pasic
  1 sibling, 1 reply; 52+ messages in thread
From: Cornelia Huck @ 2021-10-04  7:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Halil Pasic, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390

On Sat, Oct 02 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Fri, Oct 01, 2021 at 05:18:46PM +0200, Cornelia Huck wrote:
>> I'd say we need a hack here so that we assume little-endian config space
>> if VERSION_1 has been offered; if your patch here works, I assume QEMU
>> does what we expect (assmuming little-endian as well.) I'm mostly
>> wondering what happens if you use a different VMM; can we expect it to
>> work similar to QEMU?
>
> Hard to say of course ... hopefully other VMMs are actually
> implementing the spec. E.g. IIUC rust vmm is modern only.

Yes, I kind of hope they are simply doing LE config space accesses.

Are there any other VMMs that are actually supported on s390x (or other
BE architectures)?


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-04  2:23             ` Halil Pasic
@ 2021-10-04  9:07               ` Michael S. Tsirkin
  2021-10-05 10:06                 ` Cornelia Huck
  2021-10-05 10:43                 ` Halil Pasic
  0 siblings, 2 replies; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-10-04  9:07 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Cornelia Huck, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	stefanha, qemu-devel, Raphael Norwitz

On Mon, Oct 04, 2021 at 04:23:23AM +0200, Halil Pasic wrote:
> On Sat, 2 Oct 2021 14:13:37 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > > Anyone else have an idea? This is a nasty regression; we could revert the
> > > patch, which would remove the symptoms and give us some time, but that
> > > doesn't really feel right, I'd do that only as a last resort.  
> > 
> > Well we have Halil's hack (except I would limit it
> > to only apply to BE, only do devices with validate,
> > and only in modern mode), and we will fix QEMU to be spec compliant.
> > Between these why do we need any conditional compiles?
> 
> We don't. As I stated before, this hack is flawed because it
> effectively breaks fencing features by the driver with QEMU. Some
> features can not be unset after once set, because we tend to try to
> enable the corresponding functionality whenever we see a write
> features operation with the feature bit set, and we don't disable, if a
> subsequent features write operation stores the feature bit as not set.

Something to fix in QEMU too, I think.

> But it looks like VIRTIO_1 is fine to get cleared afterwards.

We'd never clear it though - why would we?

> So my hack
> should actually look like posted below, modulo conditions.


Looking at it some more, I see that vhost-user actually
does not send features to the backend until FEATURES_OK.
However, the code in contrib for vhost-user-blk at least seems
broken wrt endian-ness ATM. What about other backends though?
Hard to be sure right?
Cc Raphael and Stefan so they can take a look.
And I guess it's time we CC'd qemu-devel too.

For now I am beginning to think we should either revert or just limit
validation to LE and think about all this some more. And I am inclining
to do a revert. These are all hypervisors that shipped for a long time.
Do we need a flag for early config space access then?



> 
> Regarding the conditions I guess checking that driver_features has
> F_VERSION_1 already satisfies "only modern mode", or?

Right.

> For now
> I've deliberately omitted the has verify and the is big endian
> conditions so we have a better chance to see if something breaks
> (i.e. the approach does not work). I can add in those extra conditions
> later.

Or maybe if we will go down that road just the verify check (for
performance). I'm a bit unhappy we have the extra exit but consistency
seems more important.

> 
> --------------------------8<---------------------
> 
> From: Halil Pasic <pasic@linux.ibm.com>
> Date: Thu, 30 Sep 2021 02:38:47 +0200
> Subject: [PATCH] virtio: write back feature VERSION_1 before verify
> 
> This patch fixes a regression introduced by commit 82e89ea077b9
> ("virtio-blk: Add validation for block size in config space") and
> enables similar checks in verify() on big endian platforms.
> 
> The problem with checking multi-byte config fields in the verify
> callback, on big endian platforms, and with a possibly transitional
> device is the following. The verify() callback is called between
> config->get_features() and virtio_finalize_features(). That we have a
> device that offered F_VERSION_1 then we have the following options
> either the device is transitional, and then it has to present the legacy
> interface, i.e. a big endian config space until F_VERSION_1 is
> negotiated, or we have a non-transitional device, which makes
> F_VERSION_1 mandatory, and only implements the non-legacy interface and
> thus presents a little endian config space. Because at this point we
> can't know if the device is transitional or non-transitional, we can't
> know do we need to byte swap or not.

Well we established that we can know. Here's an alternative explanation:

	The virtio specification virtio-v1.1-cs01 states:

	Transitional devices MUST detect Legacy drivers by detecting that
	VIRTIO_F_VERSION_1 has not been acknowledged by the driver.
	This is exactly what QEMU as of 6.1 has done relying solely
	on VIRTIO_F_VERSION_1 for detecting that.

	However, the specification also says:
	driver MAY read (but MUST NOT write) the device-specific
	configuration fields to check that it can support the device before
	accepting it.

	In that case, any device relying solely on VIRTIO_F_VERSION_1
	for detecting legacy drivers will return data in legacy format.
	In particular, this implies that it is in big endian format
	for big endian guests. This naturally confuses the driver
	which expects little endian in the modern mode.

	It is probably a good idea to amend the spec to clarify that
	VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation
	is complete. However, we already have regression so let's
	try to address it.


> 
> The virtio spec explicitly states that the driver MAY read config
> between reading and writing the features so saying that first accessing
> the config before feature negotiation is done is not an option. The
> specification ain't clear about setting the features multiple times
> before FEATURES_OK, so I guess that should be fine to set F_VERSION_1
> since at this point we already know that we are about to negotiate
> F_VERSION_1.
> 
> I don't consider this patch super clean, but frankly I don't think we
> have a ton of options. Another option that may or man not be cleaner,
> but is also IMHO much uglier is to figure out whether the device is
> transitional by rejecting _F_VERSION_1, then resetting it and proceeding
> according tho what we have figured out, hoping that the characteristics
> of the device didn't change.

An empty line before tags.

> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
> Reported-by: markver@us.ibm.com

Let's add more commits that are affected. E.g. virtio-net with MTU
feature bit set is affected too.

So let's add Fixes tag for:
commit 14de9d114a82a564b94388c95af79a701dc93134
Author: Aaron Conole <aconole@redhat.com>
Date:   Fri Jun 3 16:57:12 2016 -0400

    virtio-net: Add initial MTU advice feature
    
I think that's all, but pls double check me.


> ---
>  drivers/virtio/virtio.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 0a5b54034d4b..2b9358f2e22a 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -239,6 +239,12 @@ static int virtio_dev_probe(struct device *_d)
>  		driver_features_legacy = driver_features;
>  	}
>  
> +	/* Write F_VERSION_1 feature to pin down endianness */
> +	if (device_features & (1ULL << VIRTIO_F_VERSION_1) & driver_features) {
> +		dev->features = (1ULL << VIRTIO_F_VERSION_1);
> +		dev->config->finalize_features(dev);
> +	}
> +
>  	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
>  		dev->features = driver_features & device_features;
>  	else
> -- 
> 2.31.1
> 
> 
> 
> 
> 
>  


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-04  7:01             ` Cornelia Huck
@ 2021-10-04  9:25               ` Halil Pasic
  2021-10-04  9:51                 ` Cornelia Huck
  0 siblings, 1 reply; 52+ messages in thread
From: Halil Pasic @ 2021-10-04  9:25 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Michael S. Tsirkin, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	Halil Pasic

On Mon, 04 Oct 2021 09:01:42 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> On Sat, Oct 02 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Fri, Oct 01, 2021 at 05:18:46PM +0200, Cornelia Huck wrote:  
> >> I'd say we need a hack here so that we assume little-endian config space
> >> if VERSION_1 has been offered; if your patch here works, I assume QEMU
> >> does what we expect (assmuming little-endian as well.) I'm mostly
> >> wondering what happens if you use a different VMM; can we expect it to
> >> work similar to QEMU?  
> >
> > Hard to say of course ... hopefully other VMMs are actually
> > implementing the spec. E.g. IIUC rust vmm is modern only.  
> 
> Yes, I kind of hope they are simply doing LE config space accesses.
> 
> Are there any other VMMs that are actually supported on s390x (or other
> BE architectures)?
> 

I think zCX (z/OS Container Extensions) is relevant as it uses virtio.
That is all I know about.

Regards,
Halil

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-04  9:25               ` Halil Pasic
@ 2021-10-04  9:51                 ` Cornelia Huck
  0 siblings, 0 replies; 52+ messages in thread
From: Cornelia Huck @ 2021-10-04  9:51 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Michael S. Tsirkin, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	Halil Pasic

On Mon, Oct 04 2021, Halil Pasic <pasic@linux.ibm.com> wrote:

> On Mon, 04 Oct 2021 09:01:42 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
>
>> On Sat, Oct 02 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> 
>> > On Fri, Oct 01, 2021 at 05:18:46PM +0200, Cornelia Huck wrote:  
>> >> I'd say we need a hack here so that we assume little-endian config space
>> >> if VERSION_1 has been offered; if your patch here works, I assume QEMU
>> >> does what we expect (assmuming little-endian as well.) I'm mostly
>> >> wondering what happens if you use a different VMM; can we expect it to
>> >> work similar to QEMU?  
>> >
>> > Hard to say of course ... hopefully other VMMs are actually
>> > implementing the spec. E.g. IIUC rust vmm is modern only.  
>> 
>> Yes, I kind of hope they are simply doing LE config space accesses.
>> 
>> Are there any other VMMs that are actually supported on s390x (or other
>> BE architectures)?
>> 
>
> I think zCX (z/OS Container Extensions) is relevant as it uses virtio.
> That is all I know about.

Ok, I'll assume that you (IBM) will be able to verify that any fixup
will continue to work there.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-03  7:26           ` Michael S. Tsirkin
@ 2021-10-04 12:01             ` Cornelia Huck
  2021-10-04 12:54               ` Michael S. Tsirkin
  0 siblings, 1 reply; 52+ messages in thread
From: Cornelia Huck @ 2021-10-04 12:01 UTC (permalink / raw)
  To: Michael S. Tsirkin, Halil Pasic
  Cc: Jason Wang, Xie Yongji, virtualization, linux-kernel, markver,
	Christian Borntraeger, linux-s390, virtio-dev

On Sun, Oct 03 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:

> Sent too early. So here's what I propose. Could you pls take a look
> and if you like this, post a ccw section?

We have not talked about the mmio transport so far, but I guess it
should be fine as legacy and standard are separated.

> There's also an attempt to prevent fallback from modern to legacy
> here since if driver does fallback then failing FEATURES_OK can't work
> properly.
> That's a separate issue, will be a separate patch when I post
> this for consideration by the TC.
>
>
> diff --git a/content.tex b/content.tex
> index 1398390..06271f4 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -140,10 +140,13 @@ \subsection{Legacy Interface: A Note on Feature
>  Bits}\label{sec:Basic Facilities of a Virtio Device / Feature
>  Bits / Legacy Interface: A Note on Feature Bits}
>  
> -Transitional Drivers MUST detect Legacy Devices by detecting that
> -the feature bit VIRTIO_F_VERSION_1 is not offered.
> -Transitional devices MUST detect Legacy drivers by detecting that
> -VIRTIO_F_VERSION_1 has not been acknowledged by the driver.
> +Transitional drivers MAY support operating legacy devices.
> +Transitional devices MAY support operation by legacy drivers.

Why 'MAY'? Isn't the whole point of transitional that it can deal with
both?

> +
> +Transitional drivers MUST detect legacy devices in a way that is
> +transport specific.
> +Transitional devices MUST detect legacy drivers in a way that
> +is transport specific.
>  
>  In this case device is used through the legacy interface.
>  
> @@ -160,6 +163,33 @@ \subsection{Legacy Interface: A Note on Feature
>  Specification text within these sections generally does not apply
>  to non-transitional devices.
>  
> +\begin{note}
> +The device offers different features when used through
> +the legacy interface and when operated in accordance with this
> +specification.
> +\end{note}
> +
> +Transitional drivers MUST use Devices only through the legacy interface

s/Devices only through the legacy interface/devices through the legacy
interface only/

?

> +if the feature bit VIRTIO_F_VERSION_1 is not offered.
> +Transitional devices MUST NOT offer VIRTIO_F_VERSION_1 when used through
> +the legacy interface.
> +
> +When the driver uses a device through the legacy interface, then it
> +MUST only accept the features the device offered through the
> +legacy interface.
> +
> +When used through the legacy interface, the device SHOULD
> +validate that the driver only accepted the features it
> +offered through the legacy interface.
> +
> +When operating a transitional device, a transitional driver
> +SHOULD NOT use the device through the legacy interface if
> +operation through the modern interface has failed.
> +In particular, a transitional driver
> +SHOULD NOT fall back to using the device through the
> +legacy interface if feature negotiation failed
> +(since that would defeat the purpose of the FEATURES_OK bit).
> +
>  \section{Notifications}\label{sec:Basic Facilities of a Virtio Device
>  / Notifications}
>  
> @@ -1003,6 +1033,12 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>  
>  The driver MUST NOT write a 0 to \field{queue_enable}.
>  
> +\paragraph}{Legacy Interface: Common configuration structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Legacy Interface: Common configuration structure layout}
> +Transitional drivers SHOULD detect legacy devices by detecting
> +that the device has the Transitional PCI Device ID in
> +the range 0x1000 to 0x103f and lacks a VIRTIO_PCI_CAP_COMMON_CFG
> +capability specifying the location of a common configuration structure.
> +
>  \subsubsection{Notification structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
>  
>  The notification location is found using the VIRTIO_PCI_CAP_NOTIFY_CFG
> @@ -1288,6 +1324,10 @@ \subsubsection{Legacy Interfaces: A Note on PCI Device Layout}\label{sec:Virtio
>  Transitional devices MUST present part of configuration
>  registers in a legacy configuration structure in BAR0 in the first I/O
>  region of the PCI device, as documented below.
> +
> +Transitional devices SHOULD detect legacy drivers by detecting
> +access to the legacy configuration structure.
> +
>  When using the legacy interface, transitional drivers
>  MUST use the legacy configuration structure in BAR0 in the first
>  I/O region of the PCI device, as documented below.

Generally, looks good to me.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-02 10:21     ` Michael S. Tsirkin
@ 2021-10-04 12:19       ` Cornelia Huck
  2021-10-04 13:11         ` Michael S. Tsirkin
  0 siblings, 1 reply; 52+ messages in thread
From: Cornelia Huck @ 2021-10-04 12:19 UTC (permalink / raw)
  To: Michael S. Tsirkin, Halil Pasic
  Cc: Jason Wang, Xie Yongji, virtualization, linux-kernel, markver,
	Christian Borntraeger, linux-s390, qemu-devel


[cc:qemu-devel]

On Sat, Oct 02 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Fri, Oct 01, 2021 at 09:21:25AM +0200, Halil Pasic wrote:
>> On Thu, 30 Sep 2021 07:12:21 -0400
>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> 
>> > On Thu, Sep 30, 2021 at 03:20:49AM +0200, Halil Pasic wrote:
>> > > This patch fixes a regression introduced by commit 82e89ea077b9
>> > > ("virtio-blk: Add validation for block size in config space") and
>> > > enables similar checks in verify() on big endian platforms.
>> > > 
>> > > The problem with checking multi-byte config fields in the verify
>> > > callback, on big endian platforms, and with a possibly transitional
>> > > device is the following. The verify() callback is called between
>> > > config->get_features() and virtio_finalize_features(). That we have a
>> > > device that offered F_VERSION_1 then we have the following options
>> > > either the device is transitional, and then it has to present the legacy
>> > > interface, i.e. a big endian config space until F_VERSION_1 is
>> > > negotiated, or we have a non-transitional device, which makes
>> > > F_VERSION_1 mandatory, and only implements the non-legacy interface and
>> > > thus presents a little endian config space. Because at this point we
>> > > can't know if the device is transitional or non-transitional, we can't
>> > > know do we need to byte swap or not.  
>> > 
>> > Hmm which transport does this refer to?
>> 
>> It is the same with virtio-ccw and virtio-pci. I see the same problem
>> with both on s390x. I didn't try with virtio-blk-pci-non-transitional
>> yet (have to figure out how to do that with libvirt) for pci I used
>> virtio-blk-pci.
>> 
>> > Distinguishing between legacy and modern drivers is transport
>> > specific.  PCI presents
>> > legacy and modern at separate addresses so distinguishing
>> > between these two should be no trouble.
>> 
>> You mean the device id? Yes that is bolted down in the spec, but
>> currently we don't exploit that information. Furthermore there
>> is a fat chance that with QEMU even the allegedly non-transitional
>> devices only present a little endian config space after VERSION_1
>> was negotiated. Namely get_config for virtio-blk is implemented in
>> virtio_blk_update_config() which does virtio_stl_p(vdev,
>> &blkcfg.blk_size, blk_size) and in there we don't care
>> about transitional or not:
>> 
>> static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
>> {
>> #if defined(LEGACY_VIRTIO_IS_BIENDIAN)
>>     return virtio_is_big_endian(vdev);
>> #elif defined(TARGET_WORDS_BIGENDIAN)
>>     if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
>>         /* Devices conforming to VIRTIO 1.0 or later are always LE. */
>>         return false;
>>     }
>>     return true;
>> #else
>>     return false;
>> #endif
>> }
>> 
>
> ok so that's a QEMU bug. Any virtio 1.0 and up
> compatible device must use LE.
> It can also present a legacy config space where the
> endian depends on the guest.

So, how is the virtio core supposed to determine this? A
transport-specific callback?

>
>> > Channel i/o has versioning so same thing?
>> >
>> 
>> Don't think so. Both a transitional and a non-transitional device
>> would have to accept revisions higher than 0 if the driver tried to
>> negotiate those (and we do in our case).
>
> Yes, the modern driver does. And that one is known to be LE.
> legacy driver doesn't.
>
>> > > The virtio spec explicitly states that the driver MAY read config
>> > > between reading and writing the features so saying that first accessing
>> > > the config before feature negotiation is done is not an option. The
>> > > specification ain't clear about setting the features multiple times
>> > > before FEATURES_OK, so I guess that should be fine.
>> > > 
>> > > I don't consider this patch super clean, but frankly I don't think we
>> > > have a ton of options. Another option that may or man not be cleaner,
>> > > but is also IMHO much uglier is to figure out whether the device is
>> > > transitional by rejecting _F_VERSION_1, then resetting it and proceeding
>> > > according tho what we have figured out, hoping that the characteristics
>> > > of the device didn't change.  
>> > 
>> > I am confused here. So is the problem at the device or at the driver level?
>> 
>> We have a driver regression. Since the 82e89ea077b9 ("virtio-blk: Add
>> validation for block size in config space") virtio-blk is broken on
>> s390.
>
> Because of a qemu bug. I agree. It's worth working around in the driver
> since the qemu bug has been around for a very long time.

Yes, since we introduced virtio 1 support, I guess...

>
>
>> The deeper problem is in the spec. We stated that the driver may read
>> config space before the feature negotiation is finalized, but we didn't
>> think enough about what happens when native endiannes is not little
>> endian in the different cases.
>
> Because the spec is very clear that endian-ness is LE.
> I don't see a spec issue yet here, just an implementation issue.

Maybe not really a bug in the spec, but probably an issue, as this seems
to have been unclear to most people so far.

>
>> I believe, for non-transitional devices we have a problem in the host as
>> well (i.e. in QEMU).
>
> Because QEMU ignores the spec and instead relies on the feature
> negotiation.
>
>> 
>> > I suspect it's actually the host that has the issue, not
>> > the guest?
>> 
>> I tend to say we have a problem both in the host and in the guest. I'm
>> more concerned about the problem in the guest, because that is a really
>> nasty regression.
>
> The problem is in the guest. The bug is in the host ;)
>
>> For the host. I think for legacy we don't have a
>> problem, because both sides would operate on the assumption no
>> _F_VERSION_1, IMHO the implementation for the transitional devices is
>> correct.
>
> Well no, the point of transitional is really to be 1.0 compliant
> *and* also expose a legacy interface.

Worth noting that PCI and CCW are a tad different here: PCI exposes an
additional interface, while CCW uses a revision negotiation mechanism
(for CCW, legacy and standard-compliant are much closer on the transport
side as for PCI.) MMIO does not do transitional, if I'm not wrong.

>
>> For non-transitional flavor, it depends on the device. For
>> example virtio-net and virtio-blk is broken, because we use primitives
>> like virtio_stl_p() and those don't do the right thing before feature
>> negotiation is completed. On the other hand virtio-crypto.c as a truly
>> non-transitional device uses stl_le_p() and IMHO does the right thing.
>> 
>> Thanks for your comments! I hope I managed to answer your questions. I
>> need some guidance on how do we want to move forward on this.
>> 
>> Regards,
>> Halil
>
> OK so. I don't have a problem with the patch itself,
> assuming it's enough to work around all buggy hosts.
> I am especially worried about things like vhost/vhost-user,
> I suspect they might have a bug like this too, and
> I am not sure whether your work around is enough for these.
> Can you check please?
>
> If not we'll have to move all validate code to after FEATURES_OK
> is set.

What is supposed to happen for validate after FEATURES_OK? The driver
cannot change any features at that point in time, it can only fail to
use the device.

>
> We do however want to document that this API can be called
> multiple times since that was not the case
> previously.
>
> Also, I would limit this to when
> - the validate callback exists
> - the guest endian-ness is not LE
>
> We also want to document the QEMU bug in a comment here,
> e.g. 
>
> /*
>  * QEMU before version 6.2 incorrectly uses driver features with guest
>  * endian-ness to set endian-ness for config space instead of just using
>  * LE for the modern interface as per spec.
>  * This breaks reading config in the validate callback.
>  * To work around that, when device is 1.0 (so supposed to be LE)
>  * but guest is not LE, then send the features to device one extra
>  * time before validation.
>  */

Do we need to consider migration, or do we not need to be bug-compatible
in this case?

>
> Finally I'd like to see the QEMU bug fix before I merge this one,
> since it will be harder to test with a fix.
>
>
>
>
>> > 
>> > 
>> > > Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
>> > > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
>> > > Reported-by: markver@us.ibm.com
>> > > ---
>> > >  drivers/virtio/virtio.c | 4 ++++
>> > >  1 file changed, 4 insertions(+)
>> > > 
>> > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
>> > > index 0a5b54034d4b..9dc3cfa17b1c 100644
>> > > --- a/drivers/virtio/virtio.c
>> > > +++ b/drivers/virtio/virtio.c
>> > > @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
>> > >  		if (device_features & (1ULL << i))
>> > >  			__virtio_set_bit(dev, i);
>> > >  
>> > > +	/* Write back features before validate to know endianness */
>> > > +	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
>> > > +		dev->config->finalize_features(dev);
>> > > +
>> > >  	if (drv->validate) {
>> > >  		err = drv->validate(dev);
>> > >  		if (err)
>> > > 
>> > > base-commit: 02d5e016800d082058b3d3b7c3ede136cdc6ddcb
>> > > -- 
>> > > 2.25.1  
>> > 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-04 12:01             ` Cornelia Huck
@ 2021-10-04 12:54               ` Michael S. Tsirkin
  2021-10-04 14:27                 ` Cornelia Huck
  0 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-10-04 12:54 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	virtio-dev

On Mon, Oct 04, 2021 at 02:01:14PM +0200, Cornelia Huck wrote:
> On Sun, Oct 03 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > Sent too early. So here's what I propose. Could you pls take a look
> > and if you like this, post a ccw section?
> 
> We have not talked about the mmio transport so far, but I guess it
> should be fine as legacy and standard are separated.
> 
> > There's also an attempt to prevent fallback from modern to legacy
> > here since if driver does fallback then failing FEATURES_OK can't work
> > properly.
> > That's a separate issue, will be a separate patch when I post
> > this for consideration by the TC.
> >
> >
> > diff --git a/content.tex b/content.tex
> > index 1398390..06271f4 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -140,10 +140,13 @@ \subsection{Legacy Interface: A Note on Feature
> >  Bits}\label{sec:Basic Facilities of a Virtio Device / Feature
> >  Bits / Legacy Interface: A Note on Feature Bits}
> >  
> > -Transitional Drivers MUST detect Legacy Devices by detecting that
> > -the feature bit VIRTIO_F_VERSION_1 is not offered.
> > -Transitional devices MUST detect Legacy drivers by detecting that
> > -VIRTIO_F_VERSION_1 has not been acknowledged by the driver.
> > +Transitional drivers MAY support operating legacy devices.
> > +Transitional devices MAY support operation by legacy drivers.
> 
> Why 'MAY'? Isn't the whole point of transitional that it can deal with
> both?

I guess. OK we can make it MUST.

> > +
> > +Transitional drivers MUST detect legacy devices in a way that is
> > +transport specific.
> > +Transitional devices MUST detect legacy drivers in a way that
> > +is transport specific.
> >  
> >  In this case device is used through the legacy interface.
> >  
> > @@ -160,6 +163,33 @@ \subsection{Legacy Interface: A Note on Feature
> >  Specification text within these sections generally does not apply
> >  to non-transitional devices.
> >  
> > +\begin{note}
> > +The device offers different features when used through
> > +the legacy interface and when operated in accordance with this
> > +specification.
> > +\end{note}
> > +
> > +Transitional drivers MUST use Devices only through the legacy interface
> 
> s/Devices only through the legacy interface/devices through the legacy
> interface only/
> 
> ?

Both versions are actually confused, since how do you
find out that device does not offer VIRTIO_F_VERSION_1?

I think what this should really say is

Transitional drivers MUST NOT accept VIRTIO_F_VERSION_1 through
the legacy interface.


Does linux actually satisfy this? Will it accept VIRTIO_F_VERSION_1
through the legacy interface if offered?

> > +if the feature bit VIRTIO_F_VERSION_1 is not offered.
> > +Transitional devices MUST NOT offer VIRTIO_F_VERSION_1 when used through
> > +the legacy interface.
> > +
> > +When the driver uses a device through the legacy interface, then it
> > +MUST only accept the features the device offered through the
> > +legacy interface.
> > +
> > +When used through the legacy interface, the device SHOULD
> > +validate that the driver only accepted the features it
> > +offered through the legacy interface.
> > +
> > +When operating a transitional device, a transitional driver
> > +SHOULD NOT use the device through the legacy interface if
> > +operation through the modern interface has failed.
> > +In particular, a transitional driver
> > +SHOULD NOT fall back to using the device through the
> > +legacy interface if feature negotiation failed
> > +(since that would defeat the purpose of the FEATURES_OK bit).
> > +
> >  \section{Notifications}\label{sec:Basic Facilities of a Virtio Device
> >  / Notifications}
> >  
> > @@ -1003,6 +1033,12 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
> >  
> >  The driver MUST NOT write a 0 to \field{queue_enable}.
> >  
> > +\paragraph}{Legacy Interface: Common configuration structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Legacy Interface: Common configuration structure layout}
> > +Transitional drivers SHOULD detect legacy devices by detecting
> > +that the device has the Transitional PCI Device ID in
> > +the range 0x1000 to 0x103f and lacks a VIRTIO_PCI_CAP_COMMON_CFG
> > +capability specifying the location of a common configuration structure.
> > +
> >  \subsubsection{Notification structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
> >  
> >  The notification location is found using the VIRTIO_PCI_CAP_NOTIFY_CFG
> > @@ -1288,6 +1324,10 @@ \subsubsection{Legacy Interfaces: A Note on PCI Device Layout}\label{sec:Virtio
> >  Transitional devices MUST present part of configuration
> >  registers in a legacy configuration structure in BAR0 in the first I/O
> >  region of the PCI device, as documented below.
> > +
> > +Transitional devices SHOULD detect legacy drivers by detecting
> > +access to the legacy configuration structure.
> > +
> >  When using the legacy interface, transitional drivers
> >  MUST use the legacy configuration structure in BAR0 in the first
> >  I/O region of the PCI device, as documented below.
> 
> Generally, looks good to me.

Do we want to also add explanation that features can be
changed until FEATURES_OK?


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-04 12:19       ` Cornelia Huck
@ 2021-10-04 13:11         ` Michael S. Tsirkin
  2021-10-04 14:33           ` Cornelia Huck
  2021-10-05  7:25           ` Halil Pasic
  0 siblings, 2 replies; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-10-04 13:11 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	qemu-devel

On Mon, Oct 04, 2021 at 02:19:55PM +0200, Cornelia Huck wrote:
> 
> [cc:qemu-devel]
> 
> On Sat, Oct 02 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Fri, Oct 01, 2021 at 09:21:25AM +0200, Halil Pasic wrote:
> >> On Thu, 30 Sep 2021 07:12:21 -0400
> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> 
> >> > On Thu, Sep 30, 2021 at 03:20:49AM +0200, Halil Pasic wrote:
> >> > > This patch fixes a regression introduced by commit 82e89ea077b9
> >> > > ("virtio-blk: Add validation for block size in config space") and
> >> > > enables similar checks in verify() on big endian platforms.
> >> > > 
> >> > > The problem with checking multi-byte config fields in the verify
> >> > > callback, on big endian platforms, and with a possibly transitional
> >> > > device is the following. The verify() callback is called between
> >> > > config->get_features() and virtio_finalize_features(). That we have a
> >> > > device that offered F_VERSION_1 then we have the following options
> >> > > either the device is transitional, and then it has to present the legacy
> >> > > interface, i.e. a big endian config space until F_VERSION_1 is
> >> > > negotiated, or we have a non-transitional device, which makes
> >> > > F_VERSION_1 mandatory, and only implements the non-legacy interface and
> >> > > thus presents a little endian config space. Because at this point we
> >> > > can't know if the device is transitional or non-transitional, we can't
> >> > > know do we need to byte swap or not.  
> >> > 
> >> > Hmm which transport does this refer to?
> >> 
> >> It is the same with virtio-ccw and virtio-pci. I see the same problem
> >> with both on s390x. I didn't try with virtio-blk-pci-non-transitional
> >> yet (have to figure out how to do that with libvirt) for pci I used
> >> virtio-blk-pci.
> >> 
> >> > Distinguishing between legacy and modern drivers is transport
> >> > specific.  PCI presents
> >> > legacy and modern at separate addresses so distinguishing
> >> > between these two should be no trouble.
> >> 
> >> You mean the device id? Yes that is bolted down in the spec, but
> >> currently we don't exploit that information. Furthermore there
> >> is a fat chance that with QEMU even the allegedly non-transitional
> >> devices only present a little endian config space after VERSION_1
> >> was negotiated. Namely get_config for virtio-blk is implemented in
> >> virtio_blk_update_config() which does virtio_stl_p(vdev,
> >> &blkcfg.blk_size, blk_size) and in there we don't care
> >> about transitional or not:
> >> 
> >> static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
> >> {
> >> #if defined(LEGACY_VIRTIO_IS_BIENDIAN)
> >>     return virtio_is_big_endian(vdev);
> >> #elif defined(TARGET_WORDS_BIGENDIAN)
> >>     if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> >>         /* Devices conforming to VIRTIO 1.0 or later are always LE. */
> >>         return false;
> >>     }
> >>     return true;
> >> #else
> >>     return false;
> >> #endif
> >> }
> >> 
> >
> > ok so that's a QEMU bug. Any virtio 1.0 and up
> > compatible device must use LE.
> > It can also present a legacy config space where the
> > endian depends on the guest.
> 
> So, how is the virtio core supposed to determine this? A
> transport-specific callback?

I'd say a field in VirtIODevice is easiest.

> >
> >> > Channel i/o has versioning so same thing?
> >> >
> >> 
> >> Don't think so. Both a transitional and a non-transitional device
> >> would have to accept revisions higher than 0 if the driver tried to
> >> negotiate those (and we do in our case).
> >
> > Yes, the modern driver does. And that one is known to be LE.
> > legacy driver doesn't.
> >
> >> > > The virtio spec explicitly states that the driver MAY read config
> >> > > between reading and writing the features so saying that first accessing
> >> > > the config before feature negotiation is done is not an option. The
> >> > > specification ain't clear about setting the features multiple times
> >> > > before FEATURES_OK, so I guess that should be fine.
> >> > > 
> >> > > I don't consider this patch super clean, but frankly I don't think we
> >> > > have a ton of options. Another option that may or man not be cleaner,
> >> > > but is also IMHO much uglier is to figure out whether the device is
> >> > > transitional by rejecting _F_VERSION_1, then resetting it and proceeding
> >> > > according tho what we have figured out, hoping that the characteristics
> >> > > of the device didn't change.  
> >> > 
> >> > I am confused here. So is the problem at the device or at the driver level?
> >> 
> >> We have a driver regression. Since the 82e89ea077b9 ("virtio-blk: Add
> >> validation for block size in config space") virtio-blk is broken on
> >> s390.
> >
> > Because of a qemu bug. I agree. It's worth working around in the driver
> > since the qemu bug has been around for a very long time.
> 
> Yes, since we introduced virtio 1 support, I guess...
> 
> >
> >
> >> The deeper problem is in the spec. We stated that the driver may read
> >> config space before the feature negotiation is finalized, but we didn't
> >> think enough about what happens when native endiannes is not little
> >> endian in the different cases.
> >
> > Because the spec is very clear that endian-ness is LE.
> > I don't see a spec issue yet here, just an implementation issue.
> 
> Maybe not really a bug in the spec, but probably an issue, as this seems
> to have been unclear to most people so far.
> 
> >
> >> I believe, for non-transitional devices we have a problem in the host as
> >> well (i.e. in QEMU).
> >
> > Because QEMU ignores the spec and instead relies on the feature
> > negotiation.
> >
> >> 
> >> > I suspect it's actually the host that has the issue, not
> >> > the guest?
> >> 
> >> I tend to say we have a problem both in the host and in the guest. I'm
> >> more concerned about the problem in the guest, because that is a really
> >> nasty regression.
> >
> > The problem is in the guest. The bug is in the host ;)
> >
> >> For the host. I think for legacy we don't have a
> >> problem, because both sides would operate on the assumption no
> >> _F_VERSION_1, IMHO the implementation for the transitional devices is
> >> correct.
> >
> > Well no, the point of transitional is really to be 1.0 compliant
> > *and* also expose a legacy interface.
> 
> Worth noting that PCI and CCW are a tad different here: PCI exposes an
> additional interface, while CCW uses a revision negotiation mechanism
> (for CCW, legacy and standard-compliant are much closer on the transport
> side as for PCI.) MMIO does not do transitional, if I'm not wrong.

Right. It probably still uses VIRTIO_F_VERSION_1 and we need to
fix that.

> >
> >> For non-transitional flavor, it depends on the device. For
> >> example virtio-net and virtio-blk is broken, because we use primitives
> >> like virtio_stl_p() and those don't do the right thing before feature
> >> negotiation is completed. On the other hand virtio-crypto.c as a truly
> >> non-transitional device uses stl_le_p() and IMHO does the right thing.
> >> 
> >> Thanks for your comments! I hope I managed to answer your questions. I
> >> need some guidance on how do we want to move forward on this.
> >> 
> >> Regards,
> >> Halil
> >
> > OK so. I don't have a problem with the patch itself,
> > assuming it's enough to work around all buggy hosts.
> > I am especially worried about things like vhost/vhost-user,
> > I suspect they might have a bug like this too, and
> > I am not sure whether your work around is enough for these.
> > Can you check please?
> >
> > If not we'll have to move all validate code to after FEATURES_OK
> > is set.
> 
> What is supposed to happen for validate after FEATURES_OK? The driver
> cannot change any features at that point in time, it can only fail to
> use the device.

Fail to use the device. Need to tread carefully here of course,
we don't want to break working setups.

> >
> > We do however want to document that this API can be called
> > multiple times since that was not the case
> > previously.
> >
> > Also, I would limit this to when
> > - the validate callback exists
> > - the guest endian-ness is not LE
> >
> > We also want to document the QEMU bug in a comment here,
> > e.g. 
> >
> > /*
> >  * QEMU before version 6.2 incorrectly uses driver features with guest
> >  * endian-ness to set endian-ness for config space instead of just using
> >  * LE for the modern interface as per spec.
> >  * This breaks reading config in the validate callback.
> >  * To work around that, when device is 1.0 (so supposed to be LE)
> >  * but guest is not LE, then send the features to device one extra
> >  * time before validation.
> >  */
> 
> Do we need to consider migration, or do we not need to be bug-compatible
> in this case?

I suspect we don't need to be bug compatible, any driver
accessing config before FEATURES_OK is already broken ...

> >
> > Finally I'd like to see the QEMU bug fix before I merge this one,
> > since it will be harder to test with a fix.
> >
> >
> >
> >
> >> > 
> >> > 
> >> > > Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> >> > > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
> >> > > Reported-by: markver@us.ibm.com
> >> > > ---
> >> > >  drivers/virtio/virtio.c | 4 ++++
> >> > >  1 file changed, 4 insertions(+)
> >> > > 
> >> > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> >> > > index 0a5b54034d4b..9dc3cfa17b1c 100644
> >> > > --- a/drivers/virtio/virtio.c
> >> > > +++ b/drivers/virtio/virtio.c
> >> > > @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
> >> > >  		if (device_features & (1ULL << i))
> >> > >  			__virtio_set_bit(dev, i);
> >> > >  
> >> > > +	/* Write back features before validate to know endianness */
> >> > > +	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
> >> > > +		dev->config->finalize_features(dev);
> >> > > +
> >> > >  	if (drv->validate) {
> >> > >  		err = drv->validate(dev);
> >> > >  		if (err)
> >> > > 
> >> > > base-commit: 02d5e016800d082058b3d3b7c3ede136cdc6ddcb
> >> > > -- 
> >> > > 2.25.1  
> >> > 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-04 12:54               ` Michael S. Tsirkin
@ 2021-10-04 14:27                 ` Cornelia Huck
  2021-10-04 15:05                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 52+ messages in thread
From: Cornelia Huck @ 2021-10-04 14:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Halil Pasic, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	virtio-dev

On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Oct 04, 2021 at 02:01:14PM +0200, Cornelia Huck wrote:
>> On Sun, Oct 03 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> > @@ -160,6 +163,33 @@ \subsection{Legacy Interface: A Note on Feature
>> >  Specification text within these sections generally does not apply
>> >  to non-transitional devices.
>> >  
>> > +\begin{note}
>> > +The device offers different features when used through
>> > +the legacy interface and when operated in accordance with this
>> > +specification.
>> > +\end{note}
>> > +
>> > +Transitional drivers MUST use Devices only through the legacy interface
>> 
>> s/Devices only through the legacy interface/devices through the legacy
>> interface only/
>> 
>> ?
>
> Both versions are actually confused, since how do you
> find out that device does not offer VIRTIO_F_VERSION_1?
>
> I think what this should really say is
>
> Transitional drivers MUST NOT accept VIRTIO_F_VERSION_1 through
> the legacy interface.

Ok, that makes sense.

Would it make sense that transitional drivers MUST accept VERSION_1
through the non-legacy interface? Or is that redundant?

>
>
> Does linux actually satisfy this? Will it accept VIRTIO_F_VERSION_1
> through the legacy interface if offered?

I think that the Linux drivers will not operate on feature bit 32+ if
they are in legacy mode?

>> 
>> Generally, looks good to me.
>
> Do we want to also add explanation that features can be
> changed until FEATURES_OK?

I always considered that to be implict, as feature negotiation is not
over until we have FEATURES_OK. Not sure whether we need an extra note.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-04 13:11         ` Michael S. Tsirkin
@ 2021-10-04 14:33           ` Cornelia Huck
  2021-10-04 15:07             ` Michael S. Tsirkin
  2021-10-05  7:25           ` Halil Pasic
  1 sibling, 1 reply; 52+ messages in thread
From: Cornelia Huck @ 2021-10-04 14:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Halil Pasic, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	qemu-devel

On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Oct 04, 2021 at 02:19:55PM +0200, Cornelia Huck wrote:
>> 
>> [cc:qemu-devel]
>> 
>> On Sat, Oct 02 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> 
>> > On Fri, Oct 01, 2021 at 09:21:25AM +0200, Halil Pasic wrote:
>> >> On Thu, 30 Sep 2021 07:12:21 -0400
>> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >> 
>> >> > On Thu, Sep 30, 2021 at 03:20:49AM +0200, Halil Pasic wrote:
>> >> > > This patch fixes a regression introduced by commit 82e89ea077b9
>> >> > > ("virtio-blk: Add validation for block size in config space") and
>> >> > > enables similar checks in verify() on big endian platforms.
>> >> > > 
>> >> > > The problem with checking multi-byte config fields in the verify
>> >> > > callback, on big endian platforms, and with a possibly transitional
>> >> > > device is the following. The verify() callback is called between
>> >> > > config->get_features() and virtio_finalize_features(). That we have a
>> >> > > device that offered F_VERSION_1 then we have the following options
>> >> > > either the device is transitional, and then it has to present the legacy
>> >> > > interface, i.e. a big endian config space until F_VERSION_1 is
>> >> > > negotiated, or we have a non-transitional device, which makes
>> >> > > F_VERSION_1 mandatory, and only implements the non-legacy interface and
>> >> > > thus presents a little endian config space. Because at this point we
>> >> > > can't know if the device is transitional or non-transitional, we can't
>> >> > > know do we need to byte swap or not.  
>> >> > 
>> >> > Hmm which transport does this refer to?
>> >> 
>> >> It is the same with virtio-ccw and virtio-pci. I see the same problem
>> >> with both on s390x. I didn't try with virtio-blk-pci-non-transitional
>> >> yet (have to figure out how to do that with libvirt) for pci I used
>> >> virtio-blk-pci.
>> >> 
>> >> > Distinguishing between legacy and modern drivers is transport
>> >> > specific.  PCI presents
>> >> > legacy and modern at separate addresses so distinguishing
>> >> > between these two should be no trouble.
>> >> 
>> >> You mean the device id? Yes that is bolted down in the spec, but
>> >> currently we don't exploit that information. Furthermore there
>> >> is a fat chance that with QEMU even the allegedly non-transitional
>> >> devices only present a little endian config space after VERSION_1
>> >> was negotiated. Namely get_config for virtio-blk is implemented in
>> >> virtio_blk_update_config() which does virtio_stl_p(vdev,
>> >> &blkcfg.blk_size, blk_size) and in there we don't care
>> >> about transitional or not:
>> >> 
>> >> static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
>> >> {
>> >> #if defined(LEGACY_VIRTIO_IS_BIENDIAN)
>> >>     return virtio_is_big_endian(vdev);
>> >> #elif defined(TARGET_WORDS_BIGENDIAN)
>> >>     if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
>> >>         /* Devices conforming to VIRTIO 1.0 or later are always LE. */
>> >>         return false;
>> >>     }
>> >>     return true;
>> >> #else
>> >>     return false;
>> >> #endif
>> >> }
>> >> 
>> >
>> > ok so that's a QEMU bug. Any virtio 1.0 and up
>> > compatible device must use LE.
>> > It can also present a legacy config space where the
>> > endian depends on the guest.
>> 
>> So, how is the virtio core supposed to determine this? A
>> transport-specific callback?
>
> I'd say a field in VirtIODevice is easiest.

The transport needs to set this as soon as it has figured out whether
we're using legacy or not. I guess we also need to fence off any
accesses respectively error out the device if the driver tries any
read/write operations that would depend on that knowledge?

And using a field in VirtIODevice would probably need some care when
migrating. Hm...


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-04 14:27                 ` Cornelia Huck
@ 2021-10-04 15:05                   ` Michael S. Tsirkin
  2021-10-04 15:45                     ` [virtio-dev] " Cornelia Huck
  0 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-10-04 15:05 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	virtio-dev

On Mon, Oct 04, 2021 at 04:27:23PM +0200, Cornelia Huck wrote:
> On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Oct 04, 2021 at 02:01:14PM +0200, Cornelia Huck wrote:
> >> On Sun, Oct 03 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> > @@ -160,6 +163,33 @@ \subsection{Legacy Interface: A Note on Feature
> >> >  Specification text within these sections generally does not apply
> >> >  to non-transitional devices.
> >> >  
> >> > +\begin{note}
> >> > +The device offers different features when used through
> >> > +the legacy interface and when operated in accordance with this
> >> > +specification.
> >> > +\end{note}
> >> > +
> >> > +Transitional drivers MUST use Devices only through the legacy interface
> >> 
> >> s/Devices only through the legacy interface/devices through the legacy
> >> interface only/
> >> 
> >> ?
> >
> > Both versions are actually confused, since how do you
> > find out that device does not offer VIRTIO_F_VERSION_1?
> >
> > I think what this should really say is
> >
> > Transitional drivers MUST NOT accept VIRTIO_F_VERSION_1 through
> > the legacy interface.
> 
> Ok, that makes sense.
> 
> Would it make sense that transitional drivers MUST accept VERSION_1
> through the non-legacy interface? Or is that redundant?

We already have:

A driver MUST accept VIRTIO_F_VERSION_1 if it is offered.


> >
> >
> > Does linux actually satisfy this? Will it accept VIRTIO_F_VERSION_1
> > through the legacy interface if offered?
> 
> I think that the Linux drivers will not operate on feature bit 32+ if
> they are in legacy mode?


Well ... with PCI there's no *way* for host to set bit 32 through
legacy. But it might be possible with MMIO/CCW. Can you tell me
what happens then?


> >> 
> >> Generally, looks good to me.
> >
> > Do we want to also add explanation that features can be
> > changed until FEATURES_OK?
> 
> I always considered that to be implict, as feature negotiation is not
> over until we have FEATURES_OK. Not sure whether we need an extra note.

Well Halil here says once you set a feature bit you can't clear it.
So maybe not ...

-- 
MST


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-04 14:33           ` Cornelia Huck
@ 2021-10-04 15:07             ` Michael S. Tsirkin
  2021-10-04 15:50               ` Cornelia Huck
  0 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-10-04 15:07 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	qemu-devel

On Mon, Oct 04, 2021 at 04:33:21PM +0200, Cornelia Huck wrote:
> On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Oct 04, 2021 at 02:19:55PM +0200, Cornelia Huck wrote:
> >> 
> >> [cc:qemu-devel]
> >> 
> >> On Sat, Oct 02 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> 
> >> > On Fri, Oct 01, 2021 at 09:21:25AM +0200, Halil Pasic wrote:
> >> >> On Thu, 30 Sep 2021 07:12:21 -0400
> >> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> >> 
> >> >> > On Thu, Sep 30, 2021 at 03:20:49AM +0200, Halil Pasic wrote:
> >> >> > > This patch fixes a regression introduced by commit 82e89ea077b9
> >> >> > > ("virtio-blk: Add validation for block size in config space") and
> >> >> > > enables similar checks in verify() on big endian platforms.
> >> >> > > 
> >> >> > > The problem with checking multi-byte config fields in the verify
> >> >> > > callback, on big endian platforms, and with a possibly transitional
> >> >> > > device is the following. The verify() callback is called between
> >> >> > > config->get_features() and virtio_finalize_features(). That we have a
> >> >> > > device that offered F_VERSION_1 then we have the following options
> >> >> > > either the device is transitional, and then it has to present the legacy
> >> >> > > interface, i.e. a big endian config space until F_VERSION_1 is
> >> >> > > negotiated, or we have a non-transitional device, which makes
> >> >> > > F_VERSION_1 mandatory, and only implements the non-legacy interface and
> >> >> > > thus presents a little endian config space. Because at this point we
> >> >> > > can't know if the device is transitional or non-transitional, we can't
> >> >> > > know do we need to byte swap or not.  
> >> >> > 
> >> >> > Hmm which transport does this refer to?
> >> >> 
> >> >> It is the same with virtio-ccw and virtio-pci. I see the same problem
> >> >> with both on s390x. I didn't try with virtio-blk-pci-non-transitional
> >> >> yet (have to figure out how to do that with libvirt) for pci I used
> >> >> virtio-blk-pci.
> >> >> 
> >> >> > Distinguishing between legacy and modern drivers is transport
> >> >> > specific.  PCI presents
> >> >> > legacy and modern at separate addresses so distinguishing
> >> >> > between these two should be no trouble.
> >> >> 
> >> >> You mean the device id? Yes that is bolted down in the spec, but
> >> >> currently we don't exploit that information. Furthermore there
> >> >> is a fat chance that with QEMU even the allegedly non-transitional
> >> >> devices only present a little endian config space after VERSION_1
> >> >> was negotiated. Namely get_config for virtio-blk is implemented in
> >> >> virtio_blk_update_config() which does virtio_stl_p(vdev,
> >> >> &blkcfg.blk_size, blk_size) and in there we don't care
> >> >> about transitional or not:
> >> >> 
> >> >> static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
> >> >> {
> >> >> #if defined(LEGACY_VIRTIO_IS_BIENDIAN)
> >> >>     return virtio_is_big_endian(vdev);
> >> >> #elif defined(TARGET_WORDS_BIGENDIAN)
> >> >>     if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> >> >>         /* Devices conforming to VIRTIO 1.0 or later are always LE. */
> >> >>         return false;
> >> >>     }
> >> >>     return true;
> >> >> #else
> >> >>     return false;
> >> >> #endif
> >> >> }
> >> >> 
> >> >
> >> > ok so that's a QEMU bug. Any virtio 1.0 and up
> >> > compatible device must use LE.
> >> > It can also present a legacy config space where the
> >> > endian depends on the guest.
> >> 
> >> So, how is the virtio core supposed to determine this? A
> >> transport-specific callback?
> >
> > I'd say a field in VirtIODevice is easiest.
> 
> The transport needs to set this as soon as it has figured out whether
> we're using legacy or not.

Basically on each device config access?

> I guess we also need to fence off any
> accesses respectively error out the device if the driver tries any
> read/write operations that would depend on that knowledge?
> 
> And using a field in VirtIODevice would probably need some care when
> migrating. Hm...

It's just a shorthand to minimize changes. No need to migrate I think.

-- 
MST


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-04 15:05                   ` Michael S. Tsirkin
@ 2021-10-04 15:45                     ` Cornelia Huck
  2021-10-04 20:01                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 52+ messages in thread
From: Cornelia Huck @ 2021-10-04 15:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Halil Pasic, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	virtio-dev

On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Oct 04, 2021 at 04:27:23PM +0200, Cornelia Huck wrote:
>> On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> 
>> > On Mon, Oct 04, 2021 at 02:01:14PM +0200, Cornelia Huck wrote:
>> >> On Sun, Oct 03 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >> > @@ -160,6 +163,33 @@ \subsection{Legacy Interface: A Note on Feature
>> >> >  Specification text within these sections generally does not apply
>> >> >  to non-transitional devices.
>> >> >  
>> >> > +\begin{note}
>> >> > +The device offers different features when used through
>> >> > +the legacy interface and when operated in accordance with this
>> >> > +specification.
>> >> > +\end{note}
>> >> > +
>> >> > +Transitional drivers MUST use Devices only through the legacy interface
>> >> 
>> >> s/Devices only through the legacy interface/devices through the legacy
>> >> interface only/
>> >> 
>> >> ?
>> >
>> > Both versions are actually confused, since how do you
>> > find out that device does not offer VIRTIO_F_VERSION_1?
>> >
>> > I think what this should really say is
>> >
>> > Transitional drivers MUST NOT accept VIRTIO_F_VERSION_1 through
>> > the legacy interface.
>> 
>> Ok, that makes sense.
>> 
>> Would it make sense that transitional drivers MUST accept VERSION_1
>> through the non-legacy interface? Or is that redundant?
>
> We already have:
>
> A driver MUST accept VIRTIO_F_VERSION_1 if it is offered.

Yep, so it is redundant.

>
>
>> >
>> >
>> > Does linux actually satisfy this? Will it accept VIRTIO_F_VERSION_1
>> > through the legacy interface if offered?
>> 
>> I think that the Linux drivers will not operate on feature bit 32+ if
>> they are in legacy mode?
>
>
> Well ... with PCI there's no *way* for host to set bit 32 through
> legacy. But it might be possible with MMIO/CCW. Can you tell me
> what happens then?

ccw does not support accessing bit 32+, either. Not sure about mmio.

>
>
>> >> 
>> >> Generally, looks good to me.
>> >
>> > Do we want to also add explanation that features can be
>> > changed until FEATURES_OK?
>> 
>> I always considered that to be implict, as feature negotiation is not
>> over until we have FEATURES_OK. Not sure whether we need an extra note.
>
> Well Halil here says once you set a feature bit you can't clear it.
> So maybe not ...

Ok, so what about something like

"If FEATURES_OK is not set, the driver MAY change the set of features it
accepts."

in the device initialization section?


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-04 15:07             ` Michael S. Tsirkin
@ 2021-10-04 15:50               ` Cornelia Huck
  2021-10-04 19:17                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 52+ messages in thread
From: Cornelia Huck @ 2021-10-04 15:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Halil Pasic, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	qemu-devel

On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Oct 04, 2021 at 04:33:21PM +0200, Cornelia Huck wrote:
>> On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> 
>> > On Mon, Oct 04, 2021 at 02:19:55PM +0200, Cornelia Huck wrote:
>> >> 
>> >> [cc:qemu-devel]
>> >> 
>> >> On Sat, Oct 02 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >> 
>> >> > On Fri, Oct 01, 2021 at 09:21:25AM +0200, Halil Pasic wrote:
>> >> >> On Thu, 30 Sep 2021 07:12:21 -0400
>> >> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >> >> 
>> >> >> > On Thu, Sep 30, 2021 at 03:20:49AM +0200, Halil Pasic wrote:
>> >> >> > > This patch fixes a regression introduced by commit 82e89ea077b9
>> >> >> > > ("virtio-blk: Add validation for block size in config space") and
>> >> >> > > enables similar checks in verify() on big endian platforms.
>> >> >> > > 
>> >> >> > > The problem with checking multi-byte config fields in the verify
>> >> >> > > callback, on big endian platforms, and with a possibly transitional
>> >> >> > > device is the following. The verify() callback is called between
>> >> >> > > config->get_features() and virtio_finalize_features(). That we have a
>> >> >> > > device that offered F_VERSION_1 then we have the following options
>> >> >> > > either the device is transitional, and then it has to present the legacy
>> >> >> > > interface, i.e. a big endian config space until F_VERSION_1 is
>> >> >> > > negotiated, or we have a non-transitional device, which makes
>> >> >> > > F_VERSION_1 mandatory, and only implements the non-legacy interface and
>> >> >> > > thus presents a little endian config space. Because at this point we
>> >> >> > > can't know if the device is transitional or non-transitional, we can't
>> >> >> > > know do we need to byte swap or not.  
>> >> >> > 
>> >> >> > Hmm which transport does this refer to?
>> >> >> 
>> >> >> It is the same with virtio-ccw and virtio-pci. I see the same problem
>> >> >> with both on s390x. I didn't try with virtio-blk-pci-non-transitional
>> >> >> yet (have to figure out how to do that with libvirt) for pci I used
>> >> >> virtio-blk-pci.
>> >> >> 
>> >> >> > Distinguishing between legacy and modern drivers is transport
>> >> >> > specific.  PCI presents
>> >> >> > legacy and modern at separate addresses so distinguishing
>> >> >> > between these two should be no trouble.
>> >> >> 
>> >> >> You mean the device id? Yes that is bolted down in the spec, but
>> >> >> currently we don't exploit that information. Furthermore there
>> >> >> is a fat chance that with QEMU even the allegedly non-transitional
>> >> >> devices only present a little endian config space after VERSION_1
>> >> >> was negotiated. Namely get_config for virtio-blk is implemented in
>> >> >> virtio_blk_update_config() which does virtio_stl_p(vdev,
>> >> >> &blkcfg.blk_size, blk_size) and in there we don't care
>> >> >> about transitional or not:
>> >> >> 
>> >> >> static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
>> >> >> {
>> >> >> #if defined(LEGACY_VIRTIO_IS_BIENDIAN)
>> >> >>     return virtio_is_big_endian(vdev);
>> >> >> #elif defined(TARGET_WORDS_BIGENDIAN)
>> >> >>     if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
>> >> >>         /* Devices conforming to VIRTIO 1.0 or later are always LE. */
>> >> >>         return false;
>> >> >>     }
>> >> >>     return true;
>> >> >> #else
>> >> >>     return false;
>> >> >> #endif
>> >> >> }
>> >> >> 
>> >> >
>> >> > ok so that's a QEMU bug. Any virtio 1.0 and up
>> >> > compatible device must use LE.
>> >> > It can also present a legacy config space where the
>> >> > endian depends on the guest.
>> >> 
>> >> So, how is the virtio core supposed to determine this? A
>> >> transport-specific callback?
>> >
>> > I'd say a field in VirtIODevice is easiest.
>> 
>> The transport needs to set this as soon as it has figured out whether
>> we're using legacy or not.
>
> Basically on each device config access?

Prior to the first one, I think. It should not change again, should it?

>
>> I guess we also need to fence off any
>> accesses respectively error out the device if the driver tries any
>> read/write operations that would depend on that knowledge?
>> 
>> And using a field in VirtIODevice would probably need some care when
>> migrating. Hm...
>
> It's just a shorthand to minimize changes. No need to migrate I think.

If we migrate in from an older QEMU, we don't know whether we are
dealing with legacy or not, until feature negotiation is already
done... don't we have to ask the transport?


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-04 15:50               ` Cornelia Huck
@ 2021-10-04 19:17                 ` Michael S. Tsirkin
  2021-10-06 10:13                   ` Cornelia Huck
  0 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-10-04 19:17 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	qemu-devel

On Mon, Oct 04, 2021 at 05:50:44PM +0200, Cornelia Huck wrote:
> On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Oct 04, 2021 at 04:33:21PM +0200, Cornelia Huck wrote:
> >> On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> 
> >> > On Mon, Oct 04, 2021 at 02:19:55PM +0200, Cornelia Huck wrote:
> >> >> 
> >> >> [cc:qemu-devel]
> >> >> 
> >> >> On Sat, Oct 02 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> >> 
> >> >> > On Fri, Oct 01, 2021 at 09:21:25AM +0200, Halil Pasic wrote:
> >> >> >> On Thu, 30 Sep 2021 07:12:21 -0400
> >> >> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> >> >> 
> >> >> >> > On Thu, Sep 30, 2021 at 03:20:49AM +0200, Halil Pasic wrote:
> >> >> >> > > This patch fixes a regression introduced by commit 82e89ea077b9
> >> >> >> > > ("virtio-blk: Add validation for block size in config space") and
> >> >> >> > > enables similar checks in verify() on big endian platforms.
> >> >> >> > > 
> >> >> >> > > The problem with checking multi-byte config fields in the verify
> >> >> >> > > callback, on big endian platforms, and with a possibly transitional
> >> >> >> > > device is the following. The verify() callback is called between
> >> >> >> > > config->get_features() and virtio_finalize_features(). That we have a
> >> >> >> > > device that offered F_VERSION_1 then we have the following options
> >> >> >> > > either the device is transitional, and then it has to present the legacy
> >> >> >> > > interface, i.e. a big endian config space until F_VERSION_1 is
> >> >> >> > > negotiated, or we have a non-transitional device, which makes
> >> >> >> > > F_VERSION_1 mandatory, and only implements the non-legacy interface and
> >> >> >> > > thus presents a little endian config space. Because at this point we
> >> >> >> > > can't know if the device is transitional or non-transitional, we can't
> >> >> >> > > know do we need to byte swap or not.  
> >> >> >> > 
> >> >> >> > Hmm which transport does this refer to?
> >> >> >> 
> >> >> >> It is the same with virtio-ccw and virtio-pci. I see the same problem
> >> >> >> with both on s390x. I didn't try with virtio-blk-pci-non-transitional
> >> >> >> yet (have to figure out how to do that with libvirt) for pci I used
> >> >> >> virtio-blk-pci.
> >> >> >> 
> >> >> >> > Distinguishing between legacy and modern drivers is transport
> >> >> >> > specific.  PCI presents
> >> >> >> > legacy and modern at separate addresses so distinguishing
> >> >> >> > between these two should be no trouble.
> >> >> >> 
> >> >> >> You mean the device id? Yes that is bolted down in the spec, but
> >> >> >> currently we don't exploit that information. Furthermore there
> >> >> >> is a fat chance that with QEMU even the allegedly non-transitional
> >> >> >> devices only present a little endian config space after VERSION_1
> >> >> >> was negotiated. Namely get_config for virtio-blk is implemented in
> >> >> >> virtio_blk_update_config() which does virtio_stl_p(vdev,
> >> >> >> &blkcfg.blk_size, blk_size) and in there we don't care
> >> >> >> about transitional or not:
> >> >> >> 
> >> >> >> static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
> >> >> >> {
> >> >> >> #if defined(LEGACY_VIRTIO_IS_BIENDIAN)
> >> >> >>     return virtio_is_big_endian(vdev);
> >> >> >> #elif defined(TARGET_WORDS_BIGENDIAN)
> >> >> >>     if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> >> >> >>         /* Devices conforming to VIRTIO 1.0 or later are always LE. */
> >> >> >>         return false;
> >> >> >>     }
> >> >> >>     return true;
> >> >> >> #else
> >> >> >>     return false;
> >> >> >> #endif
> >> >> >> }
> >> >> >> 
> >> >> >
> >> >> > ok so that's a QEMU bug. Any virtio 1.0 and up
> >> >> > compatible device must use LE.
> >> >> > It can also present a legacy config space where the
> >> >> > endian depends on the guest.
> >> >> 
> >> >> So, how is the virtio core supposed to determine this? A
> >> >> transport-specific callback?
> >> >
> >> > I'd say a field in VirtIODevice is easiest.
> >> 
> >> The transport needs to set this as soon as it has figured out whether
> >> we're using legacy or not.
> >
> > Basically on each device config access?
> 
> Prior to the first one, I think. It should not change again, should it?

Well yes but we never prohibited someone from poking at both ..
Doing it on each access means we don't have state to migrate.

> >
> >> I guess we also need to fence off any
> >> accesses respectively error out the device if the driver tries any
> >> read/write operations that would depend on that knowledge?
> >> 
> >> And using a field in VirtIODevice would probably need some care when
> >> migrating. Hm...
> >
> > It's just a shorthand to minimize changes. No need to migrate I think.
> 
> If we migrate in from an older QEMU, we don't know whether we are
> dealing with legacy or not, until feature negotiation is already
> done... don't we have to ask the transport?

Right but the only thing that can happen is config access.
Well and for legacy a kick I guess.

-- 
MST


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-04 15:45                     ` [virtio-dev] " Cornelia Huck
@ 2021-10-04 20:01                       ` Michael S. Tsirkin
  2021-10-05  7:38                         ` Cornelia Huck
  2021-10-05 11:17                         ` Halil Pasic
  0 siblings, 2 replies; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-10-04 20:01 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	virtio-dev

On Mon, Oct 04, 2021 at 05:45:06PM +0200, Cornelia Huck wrote:
> On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Oct 04, 2021 at 04:27:23PM +0200, Cornelia Huck wrote:
> >> On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> 
> >> > On Mon, Oct 04, 2021 at 02:01:14PM +0200, Cornelia Huck wrote:
> >> >> On Sun, Oct 03 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> >> > @@ -160,6 +163,33 @@ \subsection{Legacy Interface: A Note on Feature
> >> >> >  Specification text within these sections generally does not apply
> >> >> >  to non-transitional devices.
> >> >> >  
> >> >> > +\begin{note}
> >> >> > +The device offers different features when used through
> >> >> > +the legacy interface and when operated in accordance with this
> >> >> > +specification.
> >> >> > +\end{note}
> >> >> > +
> >> >> > +Transitional drivers MUST use Devices only through the legacy interface
> >> >> 
> >> >> s/Devices only through the legacy interface/devices through the legacy
> >> >> interface only/
> >> >> 
> >> >> ?
> >> >
> >> > Both versions are actually confused, since how do you
> >> > find out that device does not offer VIRTIO_F_VERSION_1?
> >> >
> >> > I think what this should really say is
> >> >
> >> > Transitional drivers MUST NOT accept VIRTIO_F_VERSION_1 through
> >> > the legacy interface.
> >> 
> >> Ok, that makes sense.
> >> 
> >> Would it make sense that transitional drivers MUST accept VERSION_1
> >> through the non-legacy interface? Or is that redundant?
> >
> > We already have:
> >
> > A driver MUST accept VIRTIO_F_VERSION_1 if it is offered.
> 
> Yep, so it is redundant.
> 
> >
> >
> >> >
> >> >
> >> > Does linux actually satisfy this? Will it accept VIRTIO_F_VERSION_1
> >> > through the legacy interface if offered?
> >> 
> >> I think that the Linux drivers will not operate on feature bit 32+ if
> >> they are in legacy mode?
> >
> >
> > Well ... with PCI there's no *way* for host to set bit 32 through
> > legacy. But it might be possible with MMIO/CCW. Can you tell me
> > what happens then?
> 
> ccw does not support accessing bit 32+, either. Not sure about mmio.
> 
> >
> >
> >> >> 
> >> >> Generally, looks good to me.
> >> >
> >> > Do we want to also add explanation that features can be
> >> > changed until FEATURES_OK?
> >> 
> >> I always considered that to be implict, as feature negotiation is not
> >> over until we have FEATURES_OK. Not sure whether we need an extra note.
> >
> > Well Halil here says once you set a feature bit you can't clear it.
> > So maybe not ...
> 
> Ok, so what about something like
> 
> "If FEATURES_OK is not set, the driver MAY change the set of features it
> accepts."
> 
> in the device initialization section?

Maybe "as long as". However Halil implied that some features are not
turned off properly if that happens. Halil could you pls provide
some examples?

-- 
MST


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-04 13:11         ` Michael S. Tsirkin
  2021-10-04 14:33           ` Cornelia Huck
@ 2021-10-05  7:25           ` Halil Pasic
  2021-10-05  7:53             ` Michael S. Tsirkin
  1 sibling, 1 reply; 52+ messages in thread
From: Halil Pasic @ 2021-10-05  7:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Cornelia Huck, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	qemu-devel, Halil Pasic

On Mon, 4 Oct 2021 09:11:04 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> > >> static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
> > >> {
> > >> #if defined(LEGACY_VIRTIO_IS_BIENDIAN)
> > >>     return virtio_is_big_endian(vdev);
> > >> #elif defined(TARGET_WORDS_BIGENDIAN)
> > >>     if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> > >>         /* Devices conforming to VIRTIO 1.0 or later are always LE. */
> > >>         return false;
> > >>     }
> > >>     return true;
> > >> #else
> > >>     return false;
> > >> #endif
> > >> }
> > >>   
> > >
> > > ok so that's a QEMU bug. Any virtio 1.0 and up
> > > compatible device must use LE.
> > > It can also present a legacy config space where the
> > > endian depends on the guest.  
> > 
> > So, how is the virtio core supposed to determine this? A
> > transport-specific callback?  
> 
> I'd say a field in VirtIODevice is easiest.

Wouldn't a call from transport code into virtio core
be more handy? What I have in mind is stuff like vhost-user and vdpa. My
understanding is, that for vhost setups where the config is outside qemu,
we probably need a new  command that tells the vhost backend what
endiannes to use for config. I don't think we can use
VHOST_USER_SET_VRING_ENDIAN because  that one is on a virtqueue basis
according to the doc. So for vhost-user and similar we would fire that
command and probably also set the filed, while for devices for which
control plane is handled by QEMU we would just set the field.

Does that sound about right?



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-04 20:01                       ` Michael S. Tsirkin
@ 2021-10-05  7:38                         ` Cornelia Huck
  2021-10-05 11:17                         ` Halil Pasic
  1 sibling, 0 replies; 52+ messages in thread
From: Cornelia Huck @ 2021-10-05  7:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Halil Pasic, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	virtio-dev

On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Oct 04, 2021 at 05:45:06PM +0200, Cornelia Huck wrote:
>> On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> 
>> > On Mon, Oct 04, 2021 at 04:27:23PM +0200, Cornelia Huck wrote:
>> >> On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >> 
>> >> > Do we want to also add explanation that features can be
>> >> > changed until FEATURES_OK?
>> >> 
>> >> I always considered that to be implict, as feature negotiation is not
>> >> over until we have FEATURES_OK. Not sure whether we need an extra note.
>> >
>> > Well Halil here says once you set a feature bit you can't clear it.
>> > So maybe not ...
>> 
>> Ok, so what about something like
>> 
>> "If FEATURES_OK is not set, the driver MAY change the set of features it
>> accepts."
>> 
>> in the device initialization section?
>
> Maybe "as long as". However Halil implied that some features are not
> turned off properly if that happens. Halil could you pls provide
> some examples?

Yes, "as long as" sounds better.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-05  7:25           ` Halil Pasic
@ 2021-10-05  7:53             ` Michael S. Tsirkin
  2021-10-05 10:46               ` Halil Pasic
  0 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-10-05  7:53 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Cornelia Huck, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	qemu-devel

On Tue, Oct 05, 2021 at 09:25:39AM +0200, Halil Pasic wrote:
> On Mon, 4 Oct 2021 09:11:04 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > > >> static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
> > > >> {
> > > >> #if defined(LEGACY_VIRTIO_IS_BIENDIAN)
> > > >>     return virtio_is_big_endian(vdev);
> > > >> #elif defined(TARGET_WORDS_BIGENDIAN)
> > > >>     if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> > > >>         /* Devices conforming to VIRTIO 1.0 or later are always LE. */
> > > >>         return false;
> > > >>     }
> > > >>     return true;
> > > >> #else
> > > >>     return false;
> > > >> #endif
> > > >> }
> > > >>   
> > > >
> > > > ok so that's a QEMU bug. Any virtio 1.0 and up
> > > > compatible device must use LE.
> > > > It can also present a legacy config space where the
> > > > endian depends on the guest.  
> > > 
> > > So, how is the virtio core supposed to determine this? A
> > > transport-specific callback?  
> > 
> > I'd say a field in VirtIODevice is easiest.
> 
> Wouldn't a call from transport code into virtio core
> be more handy? What I have in mind is stuff like vhost-user and vdpa. My
> understanding is, that for vhost setups where the config is outside qemu,
> we probably need a new  command that tells the vhost backend what
> endiannes to use for config. I don't think we can use
> VHOST_USER_SET_VRING_ENDIAN because  that one is on a virtqueue basis
> according to the doc. So for vhost-user and similar we would fire that
> command and probably also set the filed, while for devices for which
> control plane is handled by QEMU we would just set the field.
> 
> Does that sound about right?

I'm fine either way, but when would you invoke this?
With my idea backends can check the field when get_config
is invoked.

As for using this in VHOST, can we maybe re-use SET_FEATURES?

Kind of hacky but nice in that it will actually make existing backends
work...

-- 
MST


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-04  9:07               ` Michael S. Tsirkin
@ 2021-10-05 10:06                 ` Cornelia Huck
  2021-10-05 10:43                 ` Halil Pasic
  1 sibling, 0 replies; 52+ messages in thread
From: Cornelia Huck @ 2021-10-05 10:06 UTC (permalink / raw)
  To: Michael S. Tsirkin, Halil Pasic
  Cc: Jason Wang, Xie Yongji, virtualization, linux-kernel, markver,
	Christian Borntraeger, linux-s390, stefanha, qemu-devel,
	Raphael Norwitz

On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Oct 04, 2021 at 04:23:23AM +0200, Halil Pasic wrote:
>> --------------------------8<---------------------
>> 
>> From: Halil Pasic <pasic@linux.ibm.com>
>> Date: Thu, 30 Sep 2021 02:38:47 +0200
>> Subject: [PATCH] virtio: write back feature VERSION_1 before verify
>> 
>> This patch fixes a regression introduced by commit 82e89ea077b9
>> ("virtio-blk: Add validation for block size in config space") and
>> enables similar checks in verify() on big endian platforms.
>> 
>> The problem with checking multi-byte config fields in the verify
>> callback, on big endian platforms, and with a possibly transitional
>> device is the following. The verify() callback is called between
>> config->get_features() and virtio_finalize_features(). That we have a
>> device that offered F_VERSION_1 then we have the following options
>> either the device is transitional, and then it has to present the legacy
>> interface, i.e. a big endian config space until F_VERSION_1 is
>> negotiated, or we have a non-transitional device, which makes
>> F_VERSION_1 mandatory, and only implements the non-legacy interface and
>> thus presents a little endian config space. Because at this point we
>> can't know if the device is transitional or non-transitional, we can't
>> know do we need to byte swap or not.
>
> Well we established that we can know. Here's an alternative explanation:
>
> 	The virtio specification virtio-v1.1-cs01 states:
>
> 	Transitional devices MUST detect Legacy drivers by detecting that
> 	VIRTIO_F_VERSION_1 has not been acknowledged by the driver.
> 	This is exactly what QEMU as of 6.1 has done relying solely
> 	on VIRTIO_F_VERSION_1 for detecting that.
>
> 	However, the specification also says:
> 	driver MAY read (but MUST NOT write) the device-specific
> 	configuration fields to check that it can support the device before
> 	accepting it.
>
> 	In that case, any device relying solely on VIRTIO_F_VERSION_1
> 	for detecting legacy drivers will return data in legacy format.
> 	In particular, this implies that it is in big endian format
> 	for big endian guests. This naturally confuses the driver
> 	which expects little endian in the modern mode.
>
> 	It is probably a good idea to amend the spec to clarify that
> 	VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation
> 	is complete. However, we already have regression so let's
> 	try to address it.

I prefer that explanation.

>
>
>> 
>> The virtio spec explicitly states that the driver MAY read config
>> between reading and writing the features so saying that first accessing
>> the config before feature negotiation is done is not an option. The
>> specification ain't clear about setting the features multiple times
>> before FEATURES_OK, so I guess that should be fine to set F_VERSION_1
>> since at this point we already know that we are about to negotiate
>> F_VERSION_1.
>> 
>> I don't consider this patch super clean, but frankly I don't think we
>> have a ton of options. Another option that may or man not be cleaner,
>> but is also IMHO much uglier is to figure out whether the device is
>> transitional by rejecting _F_VERSION_1, then resetting it and proceeding
>> according tho what we have figured out, hoping that the characteristics
>> of the device didn't change.
>
> An empty line before tags.
>
>> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
>> Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
>> Reported-by: markver@us.ibm.com
>
> Let's add more commits that are affected. E.g. virtio-net with MTU
> feature bit set is affected too.
>
> So let's add Fixes tag for:
> commit 14de9d114a82a564b94388c95af79a701dc93134
> Author: Aaron Conole <aconole@redhat.com>
> Date:   Fri Jun 3 16:57:12 2016 -0400
>
>     virtio-net: Add initial MTU advice feature
>     
> I think that's all, but pls double check me.

I could not find anything else after a quick check.

>
>
>> ---
>>  drivers/virtio/virtio.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>> 
>> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
>> index 0a5b54034d4b..2b9358f2e22a 100644
>> --- a/drivers/virtio/virtio.c
>> +++ b/drivers/virtio/virtio.c
>> @@ -239,6 +239,12 @@ static int virtio_dev_probe(struct device *_d)
>>  		driver_features_legacy = driver_features;
>>  	}
>>  
>> +	/* Write F_VERSION_1 feature to pin down endianness */
>> +	if (device_features & (1ULL << VIRTIO_F_VERSION_1) & driver_features) {
>> +		dev->features = (1ULL << VIRTIO_F_VERSION_1);
>> +		dev->config->finalize_features(dev);
>> +	}
>> +
>>  	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
>>  		dev->features = driver_features & device_features;
>>  	else
>> -- 
>> 2.31.1

I think we should go with this just to fix the nasty regression for now.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-04  9:07               ` Michael S. Tsirkin
  2021-10-05 10:06                 ` Cornelia Huck
@ 2021-10-05 10:43                 ` Halil Pasic
  2021-10-05 11:11                   ` Michael S. Tsirkin
  2021-10-05 11:13                   ` Cornelia Huck
  1 sibling, 2 replies; 52+ messages in thread
From: Halil Pasic @ 2021-10-05 10:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-s390, markver, Christian Borntraeger, qemu-devel,
	Cornelia Huck, linux-kernel, virtualization, Xie Yongji,
	stefanha, Raphael Norwitz, Halil Pasic

On Mon, 4 Oct 2021 05:07:13 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Oct 04, 2021 at 04:23:23AM +0200, Halil Pasic wrote:
> > On Sat, 2 Oct 2021 14:13:37 -0400
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> > > > Anyone else have an idea? This is a nasty regression; we could revert the
> > > > patch, which would remove the symptoms and give us some time, but that
> > > > doesn't really feel right, I'd do that only as a last resort.    
> > > 
> > > Well we have Halil's hack (except I would limit it
> > > to only apply to BE, only do devices with validate,
> > > and only in modern mode), and we will fix QEMU to be spec compliant.
> > > Between these why do we need any conditional compiles?  
> > 
> > We don't. As I stated before, this hack is flawed because it
> > effectively breaks fencing features by the driver with QEMU. Some
> > features can not be unset after once set, because we tend to try to
> > enable the corresponding functionality whenever we see a write
> > features operation with the feature bit set, and we don't disable, if a
> > subsequent features write operation stores the feature bit as not set.  
> 
> Something to fix in QEMU too, I think.

Possibly. But it is the same situation: it probably has a long
history. And it may even make some sense. The obvious trigger for
doing the conditional initialization for modern is the setting of
FEATURES_OK. The problem is, legacy doesn't do FEATURES_OK. So we would
need a different trigger.

> 
> > But it looks like VIRTIO_1 is fine to get cleared afterwards.  
> 
> We'd never clear it though - why would we?
> 

Right.

> > So my hack
> > should actually look like posted below, modulo conditions.  
> 
> 
> Looking at it some more, I see that vhost-user actually
> does not send features to the backend until FEATURES_OK.

I.e. the hack does not work for transitional vhost-user devices,
but it doesn't break them either.

Furthermore, I believe there is not much we can do to support
transitional devices with vhost-user and similar, without extending
the protocol. The transport specific detection idea would need a new
vhost-user thingy to tell the device what has been figured
out, right?

In theory modern only could work, if the backends were paying extra
attention to endianness, instead of just assuming that the code is
running little-endian.

> However, the code in contrib for vhost-user-blk at least seems
> broken wrt endian-ness ATM.

Agree. For example config is native endian ATM AFAICT. 

> What about other backends though?

I think whenever the config is owned and managed by the vhost-backend
we have a problem with transitional. And we don't have everything in
the protocol to deal with this problem.

I didn't check modern for the different vhost-user backends. I don't
think we recommend our users on s390 to use those. My understanding
of the use-cases is far form complete.

> Hard to be sure right?

I agree.

> Cc Raphael and Stefan so they can take a look.
> And I guess it's time we CC'd qemu-devel too.
> 
> For now I am beginning to think we should either revert or just limit
> validation to LE and think about all this some more. And I am inclining
> to do a revert.

I'm fine with either of these as a quick fix, but we will eventually have
to find a solution. AFAICT this solution works for the s390 setups we
care about the most, but so would a revert.



> These are all hypervisors that shipped for a long time.
> Do we need a flag for early config space access then?

You mean a feature bit? I think it is a good idea even if
it weren't strictly necessary. We will have a behavior change
for some devices, and I think the ability to detect those
is valuable.

Your spec change proposal, makes it IMHO pretty clear, that
we are changing our understanding of how transitional should work.
Strictly, transitional is not a normative part of the spec AFAIU,
but still...


> 
> 
> 
> > 
> > Regarding the conditions I guess checking that driver_features has
> > F_VERSION_1 already satisfies "only modern mode", or?  
> 
> Right.
> 
> > For now
> > I've deliberately omitted the has verify and the is big endian
> > conditions so we have a better chance to see if something breaks
> > (i.e. the approach does not work). I can add in those extra conditions
> > later.  
> 
> Or maybe if we will go down that road just the verify check (for
> performance). I'm a bit unhappy we have the extra exit but consistency
> seems more important.
> 

I'm fine either way. The extra exit is only for the initialization and
one per 1 device, I have no feeling if this has a measurable performance
impact.


> > 
> > --------------------------8<---------------------
> > 
> > From: Halil Pasic <pasic@linux.ibm.com>
> > Date: Thu, 30 Sep 2021 02:38:47 +0200
> > Subject: [PATCH] virtio: write back feature VERSION_1 before verify
> > 
> > This patch fixes a regression introduced by commit 82e89ea077b9
> > ("virtio-blk: Add validation for block size in config space") and
> > enables similar checks in verify() on big endian platforms.
> > 
> > The problem with checking multi-byte config fields in the verify
> > callback, on big endian platforms, and with a possibly transitional
> > device is the following. The verify() callback is called between
> > config->get_features() and virtio_finalize_features(). That we have a
> > device that offered F_VERSION_1 then we have the following options
> > either the device is transitional, and then it has to present the legacy
> > interface, i.e. a big endian config space until F_VERSION_1 is
> > negotiated, or we have a non-transitional device, which makes
> > F_VERSION_1 mandatory, and only implements the non-legacy interface and
> > thus presents a little endian config space. Because at this point we
> > can't know if the device is transitional or non-transitional, we can't
> > know do we need to byte swap or not.  
> 
> Well we established that we can know. Here's an alternative explanation:


I thin we established how this should be in the future, where a transport
specific mechanism is used to decide are we operating in legacy mode or
in modern mode. But with the current QEMU reality, I don't think so.
Namely currently the switch native-endian config -> little endian config
happens when the VERSION_1 is negotiated, which may happen whenever
the VERSION_1 bit is changed, or only when FEATURES_OK is set
(vhost-user).

This is consistent with device should detect a legacy driver by checking
for VERSION_1, which is what the spec currently says.

So for transitional we start out with native-endian config. For modern
only the config is always LE.

The guest can distinguish between a legacy only device and a modern
capable device after the revision negotiation. A legacy device would
reject the CCW.

But both a transitional device and a modern only device would accept
a revision > 0. So the guest does not know for ccw.



> 
> 	The virtio specification virtio-v1.1-cs01 states:
> 
> 	Transitional devices MUST detect Legacy drivers by detecting that
> 	VIRTIO_F_VERSION_1 has not been acknowledged by the driver.
> 	This is exactly what QEMU as of 6.1 has done relying solely
> 	on VIRTIO_F_VERSION_1 for detecting that.
> 
> 	However, the specification also says:
> 	driver MAY read (but MUST NOT write) the device-specific
> 	configuration fields to check that it can support the device before
> 	accepting it.

s/ accepting it/setting FEATURES_OK
> 
> 	In that case, any device relying solely on VIRTIO_F_VERSION_1

s/any device/any transitional device/

> 	for detecting legacy drivers will return data in legacy format.

E.g. virtio-crypto does not support legacy, and thus it is always
providing an LE config space.

> 	In particular, this implies that it is in big endian format
> 	for big endian guests. This naturally confuses the driver
> 	which expects little endian in the modern mode.
> 
> 	It is probably a good idea to amend the spec to clarify that
> 	VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation
> 	is complete. However, we already have regression so let's
> 	try to address it.
> 
> 

I can take the new description without any changes if you like. I care
more about getting a decent fix, than a perfect patch description. Should
I send out a non-RFC with that implements the proposed changes?

> > 
> > The virtio spec explicitly states that the driver MAY read config
> > between reading and writing the features so saying that first accessing
> > the config before feature negotiation is done is not an option. The
> > specification ain't clear about setting the features multiple times
> > before FEATURES_OK, so I guess that should be fine to set F_VERSION_1
> > since at this point we already know that we are about to negotiate
> > F_VERSION_1.
> > 
> > I don't consider this patch super clean, but frankly I don't think we
> > have a ton of options. Another option that may or man not be cleaner,
> > but is also IMHO much uglier is to figure out whether the device is
> > transitional by rejecting _F_VERSION_1, then resetting it and proceeding
> > according tho what we have figured out, hoping that the characteristics
> > of the device didn't change.  
> 
> An empty line before tags.
>

Sure!
 
> > Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
> > Reported-by: markver@us.ibm.com  
> 
> Let's add more commits that are affected. E.g. virtio-net with MTU
> feature bit set is affected too.
> 
> So let's add Fixes tag for:
> commit 14de9d114a82a564b94388c95af79a701dc93134
> Author: Aaron Conole <aconole@redhat.com>
> Date:   Fri Jun 3 16:57:12 2016 -0400
> 
>     virtio-net: Add initial MTU advice feature
>     

I believe  drv->probe(dev) is called after the real finalize, so
that access should be fine or?

Don't we just have to look out for verify?

Isn't the problematic commit fe36cbe0671e ("virtio_net: clear MTU when
out of range")?

The problem with commit 14de9d114a82a is that the device won't know,
the driver didn't take the advice (for the MTU because it deemed its
value invalid). But that doesn't really hurt us.

On the other hand with fe36cbe0671e we may deem a valid MTU in the
config space invalid because of the endiannes mess-up. I that case
we would discard a perfectly good MTU advice.

> I think that's all, but pls double check me.


Looks good!
$ git grep -e '\.validate' -- '*virtio*'
drivers/block/virtio_blk.c:     .validate                       = virtblk_validate,
drivers/firmware/arm_scmi/virtio.c:     .validate = scmi_vio_validate,
drivers/net/virtio_net.c:       .validate =     virtnet_validate,
drivers/virtio/virtio_balloon.c:        .validate =     virtballoon_validate,
sound/virtio/virtio_card.c:     .validate = virtsnd_validate,

But only blk and net access config space from validate.

> 
> 
> > ---
> >  drivers/virtio/virtio.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index 0a5b54034d4b..2b9358f2e22a 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -239,6 +239,12 @@ static int virtio_dev_probe(struct device *_d)
> >  		driver_features_legacy = driver_features;
> >  	}
> >  
> > +	/* Write F_VERSION_1 feature to pin down endianness */
> > +	if (device_features & (1ULL << VIRTIO_F_VERSION_1) & driver_features) {
> > +		dev->features = (1ULL << VIRTIO_F_VERSION_1);
> > +		dev->config->finalize_features(dev);
> > +	}
> > +
> >  	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
> >  		dev->features = driver_features & device_features;
> >  	else
> > -- 
> > 2.31.1
> > 
> > 
> > 
> > 
> > 
> >    
> 
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-05  7:53             ` Michael S. Tsirkin
@ 2021-10-05 10:46               ` Halil Pasic
  2021-10-05 11:11                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 52+ messages in thread
From: Halil Pasic @ 2021-10-05 10:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-s390, markver, Christian Borntraeger, qemu-devel,
	Jason Wang, Cornelia Huck, linux-kernel, virtualization,
	Xie Yongji, Halil Pasic

On Tue, 5 Oct 2021 03:53:17 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> > Wouldn't a call from transport code into virtio core
> > be more handy? What I have in mind is stuff like vhost-user and vdpa. My
> > understanding is, that for vhost setups where the config is outside qemu,
> > we probably need a new  command that tells the vhost backend what
> > endiannes to use for config. I don't think we can use
> > VHOST_USER_SET_VRING_ENDIAN because  that one is on a virtqueue basis
> > according to the doc. So for vhost-user and similar we would fire that
> > command and probably also set the filed, while for devices for which
> > control plane is handled by QEMU we would just set the field.
> > 
> > Does that sound about right?  
> 
> I'm fine either way, but when would you invoke this?
> With my idea backends can check the field when get_config
> is invoked.
> 
> As for using this in VHOST, can we maybe re-use SET_FEATURES?
> 
> Kind of hacky but nice in that it will actually make existing backends
> work...

Basically the equivalent of this patch, just on the vhost interface,
right? Could work I have to look into it :)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-05 10:43                 ` Halil Pasic
@ 2021-10-05 11:11                   ` Michael S. Tsirkin
  2021-10-05 11:13                   ` Cornelia Huck
  1 sibling, 0 replies; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-10-05 11:11 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, markver, Christian Borntraeger, qemu-devel,
	Cornelia Huck, linux-kernel, virtualization, Xie Yongji,
	stefanha, Raphael Norwitz

On Tue, Oct 05, 2021 at 12:43:03PM +0200, Halil Pasic wrote:
> On Mon, 4 Oct 2021 05:07:13 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Oct 04, 2021 at 04:23:23AM +0200, Halil Pasic wrote:
> > > On Sat, 2 Oct 2021 14:13:37 -0400
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >   
> > > > > Anyone else have an idea? This is a nasty regression; we could revert the
> > > > > patch, which would remove the symptoms and give us some time, but that
> > > > > doesn't really feel right, I'd do that only as a last resort.    
> > > > 
> > > > Well we have Halil's hack (except I would limit it
> > > > to only apply to BE, only do devices with validate,
> > > > and only in modern mode), and we will fix QEMU to be spec compliant.
> > > > Between these why do we need any conditional compiles?  
> > > 
> > > We don't. As I stated before, this hack is flawed because it
> > > effectively breaks fencing features by the driver with QEMU. Some
> > > features can not be unset after once set, because we tend to try to
> > > enable the corresponding functionality whenever we see a write
> > > features operation with the feature bit set, and we don't disable, if a
> > > subsequent features write operation stores the feature bit as not set.  
> > 
> > Something to fix in QEMU too, I think.
> 
> Possibly. But it is the same situation: it probably has a long
> history. And it may even make some sense. The obvious trigger for
> doing the conditional initialization for modern is the setting of
> FEATURES_OK. The problem is, legacy doesn't do FEATURES_OK. So we would
> need a different trigger.
> 
> > 
> > > But it looks like VIRTIO_1 is fine to get cleared afterwards.  
> > 
> > We'd never clear it though - why would we?
> > 
> 
> Right.
> 
> > > So my hack
> > > should actually look like posted below, modulo conditions.  
> > 
> > 
> > Looking at it some more, I see that vhost-user actually
> > does not send features to the backend until FEATURES_OK.
> 
> I.e. the hack does not work for transitional vhost-user devices,
> but it doesn't break them either.
> 
> Furthermore, I believe there is not much we can do to support
> transitional devices with vhost-user and similar, without extending
> the protocol. The transport specific detection idea would need a new
> vhost-user thingy to tell the device what has been figured
> out, right?
> 
> In theory modern only could work, if the backends were paying extra
> attention to endianness, instead of just assuming that the code is
> running little-endian.

I think a reasonable thing is to send SET_FEATURES before each
GET_CONFIG, to tell backend which format is expected.

> > However, the code in contrib for vhost-user-blk at least seems
> > broken wrt endian-ness ATM.
> 
> Agree. For example config is native endian ATM AFAICT. 
> 
> > What about other backends though?
> 
> I think whenever the config is owned and managed by the vhost-backend
> we have a problem with transitional. And we don't have everything in
> the protocol to deal with this problem.
> 
> I didn't check modern for the different vhost-user backends. I don't
> think we recommend our users on s390 to use those. My understanding
> of the use-cases is far form complete.
> 
> > Hard to be sure right?
> 
> I agree.
> 
> > Cc Raphael and Stefan so they can take a look.
> > And I guess it's time we CC'd qemu-devel too.
> > 
> > For now I am beginning to think we should either revert or just limit
> > validation to LE and think about all this some more. And I am inclining
> > to do a revert.
> 
> I'm fine with either of these as a quick fix, but we will eventually have
> to find a solution. AFAICT this solution works for the s390 setups we
> care about the most, but so would a revert.

The reason I like this one is that it also fixes MTU for virtio net,
and that one we can't really revert.


> 
> 
> > These are all hypervisors that shipped for a long time.
> > Do we need a flag for early config space access then?
> 
> You mean a feature bit? I think it is a good idea even if
> it weren't strictly necessary. We will have a behavior change
> for some devices, and I think the ability to detect those
> is valuable.
> 
> Your spec change proposal, makes it IMHO pretty clear, that
> we are changing our understanding of how transitional should work.
> Strictly, transitional is not a normative part of the spec AFAIU,
> but still...
> 
> 
> > 
> > 
> > 
> > > 
> > > Regarding the conditions I guess checking that driver_features has
> > > F_VERSION_1 already satisfies "only modern mode", or?  
> > 
> > Right.
> > 
> > > For now
> > > I've deliberately omitted the has verify and the is big endian
> > > conditions so we have a better chance to see if something breaks
> > > (i.e. the approach does not work). I can add in those extra conditions
> > > later.  
> > 
> > Or maybe if we will go down that road just the verify check (for
> > performance). I'm a bit unhappy we have the extra exit but consistency
> > seems more important.
> > 
> 
> I'm fine either way. The extra exit is only for the initialization and
> one per 1 device, I have no feeling if this has a measurable performance
> impact.
> 
> 
> > > 
> > > --------------------------8<---------------------
> > > 
> > > From: Halil Pasic <pasic@linux.ibm.com>
> > > Date: Thu, 30 Sep 2021 02:38:47 +0200
> > > Subject: [PATCH] virtio: write back feature VERSION_1 before verify
> > > 
> > > This patch fixes a regression introduced by commit 82e89ea077b9
> > > ("virtio-blk: Add validation for block size in config space") and
> > > enables similar checks in verify() on big endian platforms.
> > > 
> > > The problem with checking multi-byte config fields in the verify
> > > callback, on big endian platforms, and with a possibly transitional
> > > device is the following. The verify() callback is called between
> > > config->get_features() and virtio_finalize_features(). That we have a
> > > device that offered F_VERSION_1 then we have the following options
> > > either the device is transitional, and then it has to present the legacy
> > > interface, i.e. a big endian config space until F_VERSION_1 is
> > > negotiated, or we have a non-transitional device, which makes
> > > F_VERSION_1 mandatory, and only implements the non-legacy interface and
> > > thus presents a little endian config space. Because at this point we
> > > can't know if the device is transitional or non-transitional, we can't
> > > know do we need to byte swap or not.  
> > 
> > Well we established that we can know. Here's an alternative explanation:
> 
> 
> I thin we established how this should be in the future, where a transport
> specific mechanism is used to decide are we operating in legacy mode or
> in modern mode. But with the current QEMU reality, I don't think so.
> Namely currently the switch native-endian config -> little endian config
> happens when the VERSION_1 is negotiated, which may happen whenever
> the VERSION_1 bit is changed, or only when FEATURES_OK is set
> (vhost-user).
> 
> This is consistent with device should detect a legacy driver by checking
> for VERSION_1, which is what the spec currently says.
> 
> So for transitional we start out with native-endian config. For modern
> only the config is always LE.
> 
> The guest can distinguish between a legacy only device and a modern
> capable device after the revision negotiation. A legacy device would
> reject the CCW.
> 
> But both a transitional device and a modern only device would accept
> a revision > 0. So the guest does not know for ccw.
> 


Sorry I was talking about the host not the guest.
when host sees revision > 0 it knows it's a modern guest
and so config should be LE.

> 
> > 
> > 	The virtio specification virtio-v1.1-cs01 states:
> > 
> > 	Transitional devices MUST detect Legacy drivers by detecting that
> > 	VIRTIO_F_VERSION_1 has not been acknowledged by the driver.
> > 	This is exactly what QEMU as of 6.1 has done relying solely
> > 	on VIRTIO_F_VERSION_1 for detecting that.
> > 
> > 	However, the specification also says:
> > 	driver MAY read (but MUST NOT write) the device-specific
> > 	configuration fields to check that it can support the device before
> > 	accepting it.
> 
> s/ accepting it/setting FEATURES_OK
> > 
> > 	In that case, any device relying solely on VIRTIO_F_VERSION_1
> 
> s/any device/any transitional device/
> 
> > 	for detecting legacy drivers will return data in legacy format.
> 
> E.g. virtio-crypto does not support legacy, and thus it is always
> providing an LE config space.
> 
> > 	In particular, this implies that it is in big endian format
> > 	for big endian guests. This naturally confuses the driver
> > 	which expects little endian in the modern mode.
> > 
> > 	It is probably a good idea to amend the spec to clarify that
> > 	VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation
> > 	is complete. However, we already have regression so let's
> > 	try to address it.
> > 
> > 
> 
> I can take the new description without any changes if you like.
> I care
> more about getting a decent fix, than a perfect patch description. Should
> I send out a non-RFC with that implements the proposed changes?

Also add a shortened version to the code comment pls.

> 
> > > 
> > > The virtio spec explicitly states that the driver MAY read config
> > > between reading and writing the features so saying that first accessing
> > > the config before feature negotiation is done is not an option. The
> > > specification ain't clear about setting the features multiple times
> > > before FEATURES_OK, so I guess that should be fine to set F_VERSION_1
> > > since at this point we already know that we are about to negotiate
> > > F_VERSION_1.
> > > 
> > > I don't consider this patch super clean, but frankly I don't think we
> > > have a ton of options. Another option that may or man not be cleaner,
> > > but is also IMHO much uglier is to figure out whether the device is
> > > transitional by rejecting _F_VERSION_1, then resetting it and proceeding
> > > according tho what we have figured out, hoping that the characteristics
> > > of the device didn't change.  
> > 
> > An empty line before tags.
> >
> 
> Sure!
>  
> > > Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> > > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
> > > Reported-by: markver@us.ibm.com  
> > 
> > Let's add more commits that are affected. E.g. virtio-net with MTU
> > feature bit set is affected too.
> > 
> > So let's add Fixes tag for:
> > commit 14de9d114a82a564b94388c95af79a701dc93134
> > Author: Aaron Conole <aconole@redhat.com>
> > Date:   Fri Jun 3 16:57:12 2016 -0400
> > 
> >     virtio-net: Add initial MTU advice feature
> >     
> 
> I believe  drv->probe(dev) is called after the real finalize, so
> that access should be fine or?
> 
> Don't we just have to look out for verify?

you mean validate.


> Isn't the problematic commit fe36cbe0671e ("virtio_net: clear MTU when
> out of range")?

exactly.

> The problem with commit 14de9d114a82a is that the device won't know,
> the driver didn't take the advice (for the MTU because it deemed its
> value invalid). But that doesn't really hurt us.
> On the other hand with fe36cbe0671e we may deem a valid MTU in the
> config space invalid because of the endiannes mess-up. I that case
> we would discard a perfectly good MTU advice.

right.

> 
> > I think that's all, but pls double check me.
> 
> 
> Looks good!
> $ git grep -e '\.validate' -- '*virtio*'
> drivers/block/virtio_blk.c:     .validate                       = virtblk_validate,
> drivers/firmware/arm_scmi/virtio.c:     .validate = scmi_vio_validate,
> drivers/net/virtio_net.c:       .validate =     virtnet_validate,
> drivers/virtio/virtio_balloon.c:        .validate =     virtballoon_validate,
> sound/virtio/virtio_card.c:     .validate = virtsnd_validate,
> 
> But only blk and net access config space from validate.
> 
> > 
> > 
> > > ---
> > >  drivers/virtio/virtio.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > index 0a5b54034d4b..2b9358f2e22a 100644
> > > --- a/drivers/virtio/virtio.c
> > > +++ b/drivers/virtio/virtio.c
> > > @@ -239,6 +239,12 @@ static int virtio_dev_probe(struct device *_d)
> > >  		driver_features_legacy = driver_features;
> > >  	}
> > >  
> > > +	/* Write F_VERSION_1 feature to pin down endianness */
> > > +	if (device_features & (1ULL << VIRTIO_F_VERSION_1) & driver_features) {
> > > +		dev->features = (1ULL << VIRTIO_F_VERSION_1);
> > > +		dev->config->finalize_features(dev);
> > > +	}
> > > +
> > >  	if (device_features & (1ULL << VIRTIO_F_VERSION_1))
> > >  		dev->features = driver_features & device_features;
> > >  	else
> > > -- 
> > > 2.31.1
> > > 
> > > 
> > > 
> > > 
> > > 
> > >    
> > 
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-05 10:46               ` Halil Pasic
@ 2021-10-05 11:11                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-10-05 11:11 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, markver, Christian Borntraeger, qemu-devel,
	Jason Wang, Cornelia Huck, linux-kernel, virtualization,
	Xie Yongji

On Tue, Oct 05, 2021 at 12:46:34PM +0200, Halil Pasic wrote:
> On Tue, 5 Oct 2021 03:53:17 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > > Wouldn't a call from transport code into virtio core
> > > be more handy? What I have in mind is stuff like vhost-user and vdpa. My
> > > understanding is, that for vhost setups where the config is outside qemu,
> > > we probably need a new  command that tells the vhost backend what
> > > endiannes to use for config. I don't think we can use
> > > VHOST_USER_SET_VRING_ENDIAN because  that one is on a virtqueue basis
> > > according to the doc. So for vhost-user and similar we would fire that
> > > command and probably also set the filed, while for devices for which
> > > control plane is handled by QEMU we would just set the field.
> > > 
> > > Does that sound about right?  
> > 
> > I'm fine either way, but when would you invoke this?
> > With my idea backends can check the field when get_config
> > is invoked.
> > 
> > As for using this in VHOST, can we maybe re-use SET_FEATURES?
> > 
> > Kind of hacky but nice in that it will actually make existing backends
> > work...
> 
> Basically the equivalent of this patch, just on the vhost interface,
> right? Could work I have to look into it :)

yep


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-05 10:43                 ` Halil Pasic
  2021-10-05 11:11                   ` Michael S. Tsirkin
@ 2021-10-05 11:13                   ` Cornelia Huck
  2021-10-05 11:20                     ` Michael S. Tsirkin
  2021-10-05 11:59                     ` Halil Pasic
  1 sibling, 2 replies; 52+ messages in thread
From: Cornelia Huck @ 2021-10-05 11:13 UTC (permalink / raw)
  To: Halil Pasic, Michael S. Tsirkin
  Cc: linux-s390, markver, Christian Borntraeger, qemu-devel,
	linux-kernel, virtualization, Xie Yongji, stefanha,
	Raphael Norwitz, Halil Pasic

On Tue, Oct 05 2021, Halil Pasic <pasic@linux.ibm.com> wrote:

> On Mon, 4 Oct 2021 05:07:13 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> Well we established that we can know. Here's an alternative explanation:
>
>
> I thin we established how this should be in the future, where a transport
> specific mechanism is used to decide are we operating in legacy mode or
> in modern mode. But with the current QEMU reality, I don't think so.
> Namely currently the switch native-endian config -> little endian config
> happens when the VERSION_1 is negotiated, which may happen whenever
> the VERSION_1 bit is changed, or only when FEATURES_OK is set
> (vhost-user).
>
> This is consistent with device should detect a legacy driver by checking
> for VERSION_1, which is what the spec currently says.
>
> So for transitional we start out with native-endian config. For modern
> only the config is always LE.
>
> The guest can distinguish between a legacy only device and a modern
> capable device after the revision negotiation. A legacy device would
> reject the CCW.
>
> But both a transitional device and a modern only device would accept
> a revision > 0. So the guest does not know for ccw.

Well, for pci I think the driver knows that it is using either legacy or
modern, no?

And for ccw, the driver knows at that point in time which revision it
negotiated, so it should know that a revision > 0 will use LE (and the
device will obviously know that as well.)

Or am I misunderstanding what you're getting at?


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-04 20:01                       ` Michael S. Tsirkin
  2021-10-05  7:38                         ` Cornelia Huck
@ 2021-10-05 11:17                         ` Halil Pasic
  2021-10-05 11:22                           ` Michael S. Tsirkin
  1 sibling, 1 reply; 52+ messages in thread
From: Halil Pasic @ 2021-10-05 11:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Cornelia Huck, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	virtio-dev, Halil Pasic

On Mon, 4 Oct 2021 16:01:12 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> > 
> > Ok, so what about something like
> > 
> > "If FEATURES_OK is not set, the driver MAY change the set of features it
> > accepts."
> > 
> > in the device initialization section?  
> 
> Maybe "as long as". However Halil implied that some features are not
> turned off properly if that happens. Halil could you pls provide
> some examples?



static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
{
...
    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
        qapi_event_send_failover_negotiated(n->netclient_name);
        qatomic_set(&n->failover_primary_hidden, false);
        failover_add_primary(n, &err);
        if (err) {
            warn_report_err(err);
        }
    }
}

This is probably the only one in QEMU. Back then I stopped looking
after the first hit.

Regards,
Halil

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-05 11:13                   ` Cornelia Huck
@ 2021-10-05 11:20                     ` Michael S. Tsirkin
  2021-10-05 11:59                     ` Halil Pasic
  1 sibling, 0 replies; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-10-05 11:20 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, linux-s390, markver, Christian Borntraeger,
	qemu-devel, linux-kernel, virtualization, Xie Yongji, stefanha,
	Raphael Norwitz

On Tue, Oct 05, 2021 at 01:13:31PM +0200, Cornelia Huck wrote:
> On Tue, Oct 05 2021, Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Mon, 4 Oct 2021 05:07:13 -0400
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> Well we established that we can know. Here's an alternative explanation:
> >
> >
> > I thin we established how this should be in the future, where a transport
> > specific mechanism is used to decide are we operating in legacy mode or
> > in modern mode. But with the current QEMU reality, I don't think so.
> > Namely currently the switch native-endian config -> little endian config
> > happens when the VERSION_1 is negotiated, which may happen whenever
> > the VERSION_1 bit is changed, or only when FEATURES_OK is set
> > (vhost-user).
> >
> > This is consistent with device should detect a legacy driver by checking
> > for VERSION_1, which is what the spec currently says.
> >
> > So for transitional we start out with native-endian config. For modern
> > only the config is always LE.
> >
> > The guest can distinguish between a legacy only device and a modern
> > capable device after the revision negotiation. A legacy device would
> > reject the CCW.
> >
> > But both a transitional device and a modern only device would accept
> > a revision > 0. So the guest does not know for ccw.
> 
> Well, for pci I think the driver knows that it is using either legacy or
> modern, no?
> 
> And for ccw, the driver knows at that point in time which revision it
> negotiated, so it should know that a revision > 0 will use LE (and the
> device will obviously know that as well.)
> 
> Or am I misunderstanding what you're getting at?

Exactly what I'm saying.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-05 11:17                         ` Halil Pasic
@ 2021-10-05 11:22                           ` Michael S. Tsirkin
  2021-10-05 15:20                             ` Cornelia Huck
  0 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-10-05 11:22 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Cornelia Huck, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	virtio-dev

On Tue, Oct 05, 2021 at 01:17:51PM +0200, Halil Pasic wrote:
> On Mon, 4 Oct 2021 16:01:12 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > > 
> > > Ok, so what about something like
> > > 
> > > "If FEATURES_OK is not set, the driver MAY change the set of features it
> > > accepts."
> > > 
> > > in the device initialization section?  
> > 
> > Maybe "as long as". However Halil implied that some features are not
> > turned off properly if that happens. Halil could you pls provide
> > some examples?
> 
> 
> 
> static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
> {
> ...
>     if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
>         qapi_event_send_failover_negotiated(n->netclient_name);
>         qatomic_set(&n->failover_primary_hidden, false);
>         failover_add_primary(n, &err);
>         if (err) {
>             warn_report_err(err);
>         }
>     }
> }
> 
> This is probably the only one in QEMU. Back then I stopped looking
> after the first hit.
> 
> Regards,
> Halil

Hmm ok more failover issues :(
This stuff really should be moved to set_status.

-- 
MST


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-05 11:13                   ` Cornelia Huck
  2021-10-05 11:20                     ` Michael S. Tsirkin
@ 2021-10-05 11:59                     ` Halil Pasic
  2021-10-05 15:25                       ` Cornelia Huck
  1 sibling, 1 reply; 52+ messages in thread
From: Halil Pasic @ 2021-10-05 11:59 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Michael S. Tsirkin, linux-s390, markver, Christian Borntraeger,
	qemu-devel, linux-kernel, virtualization, Xie Yongji, stefanha,
	Raphael Norwitz, Halil Pasic

On Tue, 05 Oct 2021 13:13:31 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> On Tue, Oct 05 2021, Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Mon, 4 Oct 2021 05:07:13 -0400
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:  
> >> Well we established that we can know. Here's an alternative explanation:  
> >
> >
> > I thin we established how this should be in the future, where a transport
> > specific mechanism is used to decide are we operating in legacy mode or
> > in modern mode. But with the current QEMU reality, I don't think so.
> > Namely currently the switch native-endian config -> little endian config
> > happens when the VERSION_1 is negotiated, which may happen whenever
> > the VERSION_1 bit is changed, or only when FEATURES_OK is set
> > (vhost-user).
> >
> > This is consistent with device should detect a legacy driver by checking
> > for VERSION_1, which is what the spec currently says.
> >
> > So for transitional we start out with native-endian config. For modern
> > only the config is always LE.
> >
> > The guest can distinguish between a legacy only device and a modern
> > capable device after the revision negotiation. A legacy device would
> > reject the CCW.
> >
> > But both a transitional device and a modern only device would accept
> > a revision > 0. So the guest does not know for ccw.  
> 
> Well, for pci I think the driver knows that it is using either legacy or
> modern, no?

It is mighty complicated. virtio-blk-pci-non-transitional and 
virtio-net-pci-non-transitional will give you BE, but 
virtio-crypto-pci, which is also non-transitional will get you LE,
before VERSION_1 is set (becausevirtio-crypto uses stl_le_p()). That is
fact.

The deal is that virtio-blk and virtion-net was written with
transitional in mind, and config code is the same for transitional and
non-transitional.

That is how things are now. With the QEMU changes things will be simpler.

> 
> And for ccw, the driver knows at that point in time which revision it
> negotiated, so it should know that a revision > 0 will use LE (and the
> device will obviously know that as well.)

With the future changes in QEMU, yes. Without these changes no. Without
these changes we get BE when the guest code things it is going to get
LE. That is what causes the regression.

The commit message for this patch is written from the perspective of
right now, and not from the perspective of future changes.

Or can you hack up a guest patch that looks at the revision, figures out
what endiannes is the early config access in, and does the right thing?

I don't think so. I tried to explain why that is impossible. Because
that would be preferable to messing with the the device and introducing
another exit. 

> 
> Or am I misunderstanding what you're getting at?
> 

Probably. I'm talking about pre- "do transport specific legacy detection
in the device instead of looking at VERSION_1" you are probably talking
about the post-state. If we had this new behavior for all relevant
hypervisors then we wouldn't need to do a thing in the guest. The current
code would work like charm.

Does that answer your question?

Regards,
Halil

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-05 11:22                           ` Michael S. Tsirkin
@ 2021-10-05 15:20                             ` Cornelia Huck
  0 siblings, 0 replies; 52+ messages in thread
From: Cornelia Huck @ 2021-10-05 15:20 UTC (permalink / raw)
  To: Michael S. Tsirkin, Halil Pasic
  Cc: Jason Wang, Xie Yongji, virtualization, linux-kernel, markver,
	Christian Borntraeger, linux-s390, virtio-dev, qemu-devel

On Tue, Oct 05 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Oct 05, 2021 at 01:17:51PM +0200, Halil Pasic wrote:
>> On Mon, 4 Oct 2021 16:01:12 -0400
>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> 
>> > > 
>> > > Ok, so what about something like
>> > > 
>> > > "If FEATURES_OK is not set, the driver MAY change the set of features it
>> > > accepts."
>> > > 
>> > > in the device initialization section?  
>> > 
>> > Maybe "as long as". However Halil implied that some features are not
>> > turned off properly if that happens. Halil could you pls provide
>> > some examples?
>> 
>> 
>> 
>> static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
>> {
>> ...
>>     if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
>>         qapi_event_send_failover_negotiated(n->netclient_name);
>>         qatomic_set(&n->failover_primary_hidden, false);
>>         failover_add_primary(n, &err);
>>         if (err) {
>>             warn_report_err(err);
>>         }
>>     }
>> }
>> 
>> This is probably the only one in QEMU. Back then I stopped looking
>> after the first hit.

After some grepping, I agree that this seems to be the only one.

>> 
>> Regards,
>> Halil
>
> Hmm ok more failover issues :(
> This stuff really should be moved to set_status.

Yes, F_STANDBY does not exist for legacy, so performing those actions
when FEATURES_OK is set looks like the right thing to do.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-05 11:59                     ` Halil Pasic
@ 2021-10-05 15:25                       ` Cornelia Huck
  0 siblings, 0 replies; 52+ messages in thread
From: Cornelia Huck @ 2021-10-05 15:25 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Michael S. Tsirkin, linux-s390, markver, Christian Borntraeger,
	qemu-devel, linux-kernel, virtualization, Xie Yongji, stefanha,
	Raphael Norwitz, Halil Pasic

On Tue, Oct 05 2021, Halil Pasic <pasic@linux.ibm.com> wrote:

> On Tue, 05 Oct 2021 13:13:31 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
>> Or am I misunderstanding what you're getting at?
>> 
>
> Probably. I'm talking about pre- "do transport specific legacy detection
> in the device instead of looking at VERSION_1" you are probably talking
> about the post-state. If we had this new behavior for all relevant
> hypervisors then we wouldn't need to do a thing in the guest. The current
> code would work like charm.

Yeah, I was thinking more about the desired state. We need to both fix
QEMU (and other VMMs or devices should check whether they are doing the
right thing) and add a workaround on the driver side to make it work
with existing QEMUs.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-04 19:17                 ` Michael S. Tsirkin
@ 2021-10-06 10:13                   ` Cornelia Huck
  2021-10-06 12:15                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 52+ messages in thread
From: Cornelia Huck @ 2021-10-06 10:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Halil Pasic, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	qemu-devel

On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Oct 04, 2021 at 05:50:44PM +0200, Cornelia Huck wrote:
>> On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> 
>> > On Mon, Oct 04, 2021 at 04:33:21PM +0200, Cornelia Huck wrote:
>> >> On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >> 
>> >> > On Mon, Oct 04, 2021 at 02:19:55PM +0200, Cornelia Huck wrote:
>> >> >> 
>> >> >> [cc:qemu-devel]
>> >> >> 
>> >> >> On Sat, Oct 02 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >> >> 
>> >> >> > ok so that's a QEMU bug. Any virtio 1.0 and up
>> >> >> > compatible device must use LE.
>> >> >> > It can also present a legacy config space where the
>> >> >> > endian depends on the guest.
>> >> >> 
>> >> >> So, how is the virtio core supposed to determine this? A
>> >> >> transport-specific callback?
>> >> >
>> >> > I'd say a field in VirtIODevice is easiest.
>> >> 
>> >> The transport needs to set this as soon as it has figured out whether
>> >> we're using legacy or not.
>> >
>> > Basically on each device config access?
>> 
>> Prior to the first one, I think. It should not change again, should it?
>
> Well yes but we never prohibited someone from poking at both ..
> Doing it on each access means we don't have state to migrate.

Yes; if it isn't too high overhead, that's probably the safest way to
handle it.

>
>> >
>> >> I guess we also need to fence off any
>> >> accesses respectively error out the device if the driver tries any
>> >> read/write operations that would depend on that knowledge?
>> >> 
>> >> And using a field in VirtIODevice would probably need some care when
>> >> migrating. Hm...
>> >
>> > It's just a shorthand to minimize changes. No need to migrate I think.
>> 
>> If we migrate in from an older QEMU, we don't know whether we are
>> dealing with legacy or not, until feature negotiation is already
>> done... don't we have to ask the transport?
>
> Right but the only thing that can happen is config access.

Checking on each config space access would be enough then.

> Well and for legacy a kick I guess.

I think any driver that does something that is not config space access,
status access, or feature bit handling without VERSION_1 being set is
neccessarily legacy? Does that really need special handling?


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/1] virtio: write back features before verify
  2021-10-06 10:13                   ` Cornelia Huck
@ 2021-10-06 12:15                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 52+ messages in thread
From: Michael S. Tsirkin @ 2021-10-06 12:15 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, Jason Wang, Xie Yongji, virtualization,
	linux-kernel, markver, Christian Borntraeger, linux-s390,
	qemu-devel

On Wed, Oct 06, 2021 at 12:13:14PM +0200, Cornelia Huck wrote:
> On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Oct 04, 2021 at 05:50:44PM +0200, Cornelia Huck wrote:
> >> On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> 
> >> > On Mon, Oct 04, 2021 at 04:33:21PM +0200, Cornelia Huck wrote:
> >> >> On Mon, Oct 04 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> >> 
> >> >> > On Mon, Oct 04, 2021 at 02:19:55PM +0200, Cornelia Huck wrote:
> >> >> >> 
> >> >> >> [cc:qemu-devel]
> >> >> >> 
> >> >> >> On Sat, Oct 02 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> >> >> 
> >> >> >> > ok so that's a QEMU bug. Any virtio 1.0 and up
> >> >> >> > compatible device must use LE.
> >> >> >> > It can also present a legacy config space where the
> >> >> >> > endian depends on the guest.
> >> >> >> 
> >> >> >> So, how is the virtio core supposed to determine this? A
> >> >> >> transport-specific callback?
> >> >> >
> >> >> > I'd say a field in VirtIODevice is easiest.
> >> >> 
> >> >> The transport needs to set this as soon as it has figured out whether
> >> >> we're using legacy or not.
> >> >
> >> > Basically on each device config access?
> >> 
> >> Prior to the first one, I think. It should not change again, should it?
> >
> > Well yes but we never prohibited someone from poking at both ..
> > Doing it on each access means we don't have state to migrate.
> 
> Yes; if it isn't too high overhead, that's probably the safest way to
> handle it.
> 
> >
> >> >
> >> >> I guess we also need to fence off any
> >> >> accesses respectively error out the device if the driver tries any
> >> >> read/write operations that would depend on that knowledge?
> >> >> 
> >> >> And using a field in VirtIODevice would probably need some care when
> >> >> migrating. Hm...
> >> >
> >> > It's just a shorthand to minimize changes. No need to migrate I think.
> >> 
> >> If we migrate in from an older QEMU, we don't know whether we are
> >> dealing with legacy or not, until feature negotiation is already
> >> done... don't we have to ask the transport?
> >
> > Right but the only thing that can happen is config access.
> 
> Checking on each config space access would be enough then.
> 
> > Well and for legacy a kick I guess.
> 
> I think any driver that does something that is not config space access,
> status access, or feature bit handling without VERSION_1 being set is
> neccessarily legacy? Does that really need special handling?

Likely not, I just wanted to be exact.

-- 
MST


^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2021-10-06 12:15 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-30  1:20 [RFC PATCH 1/1] virtio: write back features before verify Halil Pasic
2021-09-30  8:04 ` Christian Borntraeger
2021-09-30  9:28 ` Cornelia Huck
2021-09-30 11:03   ` Halil Pasic
2021-09-30 11:31     ` Cornelia Huck
2021-10-01 14:22       ` Halil Pasic
2021-10-01 15:18         ` Cornelia Huck
2021-10-02 18:13           ` Michael S. Tsirkin
2021-10-04  2:23             ` Halil Pasic
2021-10-04  9:07               ` Michael S. Tsirkin
2021-10-05 10:06                 ` Cornelia Huck
2021-10-05 10:43                 ` Halil Pasic
2021-10-05 11:11                   ` Michael S. Tsirkin
2021-10-05 11:13                   ` Cornelia Huck
2021-10-05 11:20                     ` Michael S. Tsirkin
2021-10-05 11:59                     ` Halil Pasic
2021-10-05 15:25                       ` Cornelia Huck
2021-10-04  7:01             ` Cornelia Huck
2021-10-04  9:25               ` Halil Pasic
2021-10-04  9:51                 ` Cornelia Huck
2021-10-02 12:09       ` Michael S. Tsirkin
2021-09-30 11:12 ` Michael S. Tsirkin
2021-09-30 11:36   ` Cornelia Huck
2021-10-02 18:20     ` Michael S. Tsirkin
2021-10-03  5:00       ` Halil Pasic
2021-10-03  6:42         ` Michael S. Tsirkin
2021-10-03  7:26           ` Michael S. Tsirkin
2021-10-04 12:01             ` Cornelia Huck
2021-10-04 12:54               ` Michael S. Tsirkin
2021-10-04 14:27                 ` Cornelia Huck
2021-10-04 15:05                   ` Michael S. Tsirkin
2021-10-04 15:45                     ` [virtio-dev] " Cornelia Huck
2021-10-04 20:01                       ` Michael S. Tsirkin
2021-10-05  7:38                         ` Cornelia Huck
2021-10-05 11:17                         ` Halil Pasic
2021-10-05 11:22                           ` Michael S. Tsirkin
2021-10-05 15:20                             ` Cornelia Huck
2021-10-01  7:21   ` Halil Pasic
2021-10-02 10:21     ` Michael S. Tsirkin
2021-10-04 12:19       ` Cornelia Huck
2021-10-04 13:11         ` Michael S. Tsirkin
2021-10-04 14:33           ` Cornelia Huck
2021-10-04 15:07             ` Michael S. Tsirkin
2021-10-04 15:50               ` Cornelia Huck
2021-10-04 19:17                 ` Michael S. Tsirkin
2021-10-06 10:13                   ` Cornelia Huck
2021-10-06 12:15                     ` Michael S. Tsirkin
2021-10-05  7:25           ` Halil Pasic
2021-10-05  7:53             ` Michael S. Tsirkin
2021-10-05 10:46               ` Halil Pasic
2021-10-05 11:11                 ` Michael S. Tsirkin
2021-10-01 14:34 ` Christian Borntraeger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).