LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] Align PCI memory regions to page size (4K) - Fix
@ 2007-10-28 17:27 Barak Fargoun
  2007-10-28 19:31 ` Greg KH
  2007-10-28 19:48 ` Arjan van de Ven
  0 siblings, 2 replies; 15+ messages in thread
From: Barak Fargoun @ 2007-10-28 17:27 UTC (permalink / raw)
  To: linux-kernel, linux-pci; +Cc: gregkh, Barak Fargoun, Guy Zana

Sorry, my mail client corrupted the previous mail containing the patch... it should be ok now...

Add a boot parameter (‘pci-mem-align’) which forces PCI memory regions to be aligned to 4K. 

This is very useful when developing an hypervisor, since in case we want to let native domains direct access to specific hardware, we don’t want PCI devices to share their memory region page with other devices. In Xen for example, PCI devices mmio resources are mapped by remapping complete pages (by Intel VT-d & the Neocleus pass-through patch for Xen).

Signed-off-by: Barak Fargoun     (barak@neocleus.com)
---
Kernel version: 2.6.18	(since this is the kernel version used in Xen for Dom-0)
---
diff -r 840b9df48b6a drivers/pci/bus.c
--- a/drivers/pci/bus.c	Tue Aug 07 09:37:41 2007 +0100
+++ b/drivers/pci/bus.c	Sun Oct 28 08:40:52 2007 -0400
@@ -16,6 +16,8 @@
 #include <linux/init.h>
 
 #include "pci.h"
+
+extern int pci_mem_align;
 
 /**
  * pci_bus_alloc_resource - allocate a resource from a parent bus
@@ -43,6 +45,15 @@ pci_bus_alloc_resource(struct pci_bus *b
 	int i, ret = -ENOMEM;
 
 	type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
+
+	/* if the boot parameter 'pci-mem-align' was specified, then we need to 
+		align the memory addresses, at page size alignment  */
+	if (pci_mem_align)
+	{
+		/* we change only alignments which are smaller than page size */
+		if (align < (PAGE_SIZE-1))
+			align = PAGE_SIZE - 1;
+	}
 
 	for (i = 0; i < PCI_BUS_NUM_RESOURCES; i++) {
 		struct resource *r = bus->resource[i];
diff -r 840b9df48b6a drivers/pci/quirks.c
--- a/drivers/pci/quirks.c	Tue Aug 07 09:37:41 2007 +0100
+++ b/drivers/pci/quirks.c	Sun Oct 28 08:40:52 2007 -0400
@@ -22,6 +22,54 @@
 #include <linux/delay.h>
 #include <linux/acpi.h>
 #include "pci.h"
+
+/* a global flag, which signals if we should align PCI mem windows to 4K */
+int pci_mem_align = 0;
+
+/* this function is called, if the 'pci-mem-align' was specified as a boot
+		parameter. Used to force the kernel to align PCI mem windows to 4k */
+static int __init set_pci_mem_align(char *str)
+{
+	pci_mem_align = 1;
+	return 1;
+}
+__setup("pci-mem-align", set_pci_mem_align);
+
+/*	
+	This quirk function enables us to force all memory resources which are 
+		assigned to PCI devices, to be aligned at a page size. This 
+		fetaurs helps us in xen when running native domains
+*/
+static void __devinit quirk_xen_align_mem_resources(struct pci_dev *dev)
+{
+	int i;
+	struct resource *r;
+
+	/* if the boot parameter 'pci-mem-align' wasn't specified, then no need
+		to align the memory addresses */
+	if (pci_mem_align == 0)
+		return;
+
+	for (i=0; i < DEVICE_COUNT_RESOURCE; ++i) {
+		r = &dev->resource[i];
+
+		if (r == NULL)
+			continue;
+
+		/* we need to adjust only memory resources */
+		if (r->flags & IORESOURCE_MEM) {
+			/* align the start address in page size alignment alignment (round up) */
+			resource_size_t old_start = r->start;
+			r->start = (r->start + (PAGE_SIZE-1))  & (~(PAGE_SIZE-1));
+
+			/* update the end address, so that the region size will not be 
+				affected */
+			r->end = r->end - (old_start - r->start);
+		}
+	}
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, quirk_xen_align_mem_resources);
+
 
 /* The Mellanox Tavor device gives false positive parity errors
  * Mark this device with a broken_parity_status, to allow

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Align PCI memory regions to page size (4K) - Fix
  2007-10-28 17:27 [PATCH] Align PCI memory regions to page size (4K) - Fix Barak Fargoun
@ 2007-10-28 19:31 ` Greg KH
  2007-10-28 19:53   ` Barak Fargoun
  2007-10-28 19:48 ` Arjan van de Ven
  1 sibling, 1 reply; 15+ messages in thread
From: Greg KH @ 2007-10-28 19:31 UTC (permalink / raw)
  To: Barak Fargoun; +Cc: linux-kernel, linux-pci, Guy Zana

On Sun, Oct 28, 2007 at 01:27:27PM -0400, Barak Fargoun wrote:
> Sorry, my mail client corrupted the previous mail containing the patch... it should be ok now...
> 
> Add a boot parameter (?pci-mem-align?) which forces PCI memory regions
> to be aligned to 4K. 
> 
> This is very useful when developing an hypervisor, since in case we
> want to let native domains direct access to specific hardware, we
> don?t want PCI devices to share their memory region page with other
> devices. In Xen for example, PCI devices mmio resources are mapped by
> remapping complete pages (by Intel VT-d & the Neocleus pass-through
> patch for Xen).
> 
> Signed-off-by: Barak Fargoun???? (barak@neocleus.com)
> ---
> Kernel version: 2.6.18	(since this is the kernel version used in Xen for Dom-0)
> ---
> diff -r 840b9df48b6a drivers/pci/bus.c
> --- a/drivers/pci/bus.c	Tue Aug 07 09:37:41 2007 +0100
> +++ b/drivers/pci/bus.c	Sun Oct 28 08:40:52 2007 -0400
> @@ -16,6 +16,8 @@
>  #include <linux/init.h>
>  
>  #include "pci.h"
> +
> +extern int pci_mem_align;

Please don't declare externs in a .c file, put it in a header file.

>  /**
>   * pci_bus_alloc_resource - allocate a resource from a parent bus
> @@ -43,6 +45,15 @@ pci_bus_alloc_resource(struct pci_bus *b
>  	int i, ret = -ENOMEM;
>  
>  	type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
> +
> +	/* if the boot parameter 'pci-mem-align' was specified, then we need to 
> +		align the memory addresses, at page size alignment  */
> +	if (pci_mem_align)
> +	{
> +		/* we change only alignments which are smaller than page size */
> +		if (align < (PAGE_SIZE-1))
> +			align = PAGE_SIZE - 1;
> +	}

Please follow the proper coding style rules as documented in
Documentation/CodingStyle.  Also running the script scrips/checkpatch.pl
would have pointed this out.

>  
>  	for (i = 0; i < PCI_BUS_NUM_RESOURCES; i++) {
>  		struct resource *r = bus->resource[i];
> diff -r 840b9df48b6a drivers/pci/quirks.c
> --- a/drivers/pci/quirks.c	Tue Aug 07 09:37:41 2007 +0100
> +++ b/drivers/pci/quirks.c	Sun Oct 28 08:40:52 2007 -0400
> @@ -22,6 +22,54 @@
>  #include <linux/delay.h>
>  #include <linux/acpi.h>
>  #include "pci.h"
> +
> +/* a global flag, which signals if we should align PCI mem windows to 4K */
> +int pci_mem_align = 0;
> +
> +/* this function is called, if the 'pci-mem-align' was specified as a boot
> +		parameter. Used to force the kernel to align PCI mem windows to 4k */

All command line paramaters need to be documented in the Documentation
directory.

> +static int __init set_pci_mem_align(char *str)
> +{
> +	pci_mem_align = 1;
> +	return 1;
> +}
> +__setup("pci-mem-align", set_pci_mem_align);
> +
> +/*	
> +	This quirk function enables us to force all memory resources which are 
> +		assigned to PCI devices, to be aligned at a page size. This 
> +		fetaurs helps us in xen when running native domains

Trailing spaces are not allowed :(

> +*/
> +static void __devinit quirk_xen_align_mem_resources(struct pci_dev *dev)
> +{
> +	int i;
> +	struct resource *r;
> +
> +	/* if the boot parameter 'pci-mem-align' wasn't specified, then no need
> +		to align the memory addresses */
> +	if (pci_mem_align == 0)
> +		return;
> +
> +	for (i=0; i < DEVICE_COUNT_RESOURCE; ++i) {
> +		r = &dev->resource[i];
> +
> +		if (r == NULL)
> +			continue;
> +
> +		/* we need to adjust only memory resources */
> +		if (r->flags & IORESOURCE_MEM) {
> +			/* align the start address in page size alignment alignment (round up) */
> +			resource_size_t old_start = r->start;
> +			r->start = (r->start + (PAGE_SIZE-1))  & (~(PAGE_SIZE-1));
> +
> +			/* update the end address, so that the region size will not be 
> +				affected */
> +			r->end = r->end - (old_start - r->start);
> +		}
> +	}
> +}
> +DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, quirk_xen_align_mem_resources);

Now for the main reason.  I really don't understand why this is needed.
Is it only for virtual machines?  Are there drivers in the kernel tree
today that need this functionality?  What breaks if you do not align
these resources?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Align PCI memory regions to page size (4K) - Fix
  2007-10-28 17:27 [PATCH] Align PCI memory regions to page size (4K) - Fix Barak Fargoun
  2007-10-28 19:31 ` Greg KH
@ 2007-10-28 19:48 ` Arjan van de Ven
  1 sibling, 0 replies; 15+ messages in thread
From: Arjan van de Ven @ 2007-10-28 19:48 UTC (permalink / raw)
  To: Barak Fargoun; +Cc: linux-kernel, linux-pci, gregkh, Barak Fargoun, Guy Zana

On Sun, 28 Oct 2007 13:27:27 -0400
"Barak Fargoun" <barak@neocleus.com> wrote:

> Sorry, my mail client corrupted the previous mail containing the
> patch... it should be ok now...
> 
> Add a boot parameter (‘pci-mem-align’) which forces PCI memory
> regions to be aligned to 4K. 
> 
> This is very useful when developing an hypervisor, since in case we
> want to let native domains direct access to specific hardware, we
> don’t want PCI devices to share their memory region page with other
> devices. In Xen for example, PCI devices mmio resources are mapped by
> remapping complete pages (by Intel VT-d & the Neocleus pass-through
> patch for Xen).
> 
> Signed-off-by: Barak Fargoun     (barak@neocleus.com)
> ---
> Kernel version: 2.6.18	(since this is the kernel version used
> in Xen for Dom-0) ---


I appreciate that Xen is about 2 years behind the curve and only uses
2.6.18, but what do you expect us on this list to do with this patch?
2.6.18 is no longer maintained: it doesn't get -stable updates anymore.
(And if it did, I doubt this kind of change would be suitable for a
-stable release anyway)...

so the first step really is to make a patch against 2.6.24-rc or a
later git version.... because then and only then is your patch
something that anyone else here can DO something with ...

(it's like the tree in the forest... if a tree falls in a forest, but
everyone moved to a different forest 2 years ago, does it make a sound?)

Greetings,
   Arjan van de Ven

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] Align PCI memory regions to page size (4K) - Fix
  2007-10-28 19:31 ` Greg KH
@ 2007-10-28 19:53   ` Barak Fargoun
  2007-10-28 20:03     ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Barak Fargoun @ 2007-10-28 19:53 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, linux-pci, Guy Zana

Hi!

Regarding all the technical stuff (documentation, coding style, etc.) -
I thought I did it correctly :( I will fix it ASAP, 
and send an update when I will finish it.

About your question: today, some of the hypervisors are using linux
kernel as their domain-0 (e.g. Xen). In order
to implement direct hardware access for these native domains (e.g.
running windows in a virtual machine above Xen),
the PCI memory regions should be aligned at-least at the page-level (so,
a virtual machine - can't see data of other
devices which may not be assigned to it). So, for that reason, we wanted
a boot parameter to let us force the kernel
to align PCI memory regions at-least at a PAGE_SIZE alignment. It is
very useful for hypervisors which are developed 
at Linux environment (e.g.: Xen).

I hope I made myself clear.

Barak

-----Original Message-----
From: Greg KH [mailto:gregkh@suse.de] 
Sent: Sunday, October 28, 2007 21:31 PM
To: Barak Fargoun
Cc: linux-kernel@vger.kernel.org; linux-pci@atrey.karlin.mff.cuni.cz;
Guy Zana
Subject: Re: [PATCH] Align PCI memory regions to page size (4K) - Fix

On Sun, Oct 28, 2007 at 01:27:27PM -0400, Barak Fargoun wrote:
> Sorry, my mail client corrupted the previous mail containing the
patch... it should be ok now...
> 
> Add a boot parameter (?pci-mem-align?) which forces PCI memory regions
> to be aligned to 4K. 
> 
> This is very useful when developing an hypervisor, since in case we
> want to let native domains direct access to specific hardware, we
> don?t want PCI devices to share their memory region page with other
> devices. In Xen for example, PCI devices mmio resources are mapped by
> remapping complete pages (by Intel VT-d & the Neocleus pass-through
> patch for Xen).
> 
> Signed-off-by: Barak Fargoun???? (barak@neocleus.com)
> ---
> Kernel version: 2.6.18	(since this is the kernel version used
in Xen for Dom-0)
> ---
> diff -r 840b9df48b6a drivers/pci/bus.c
> --- a/drivers/pci/bus.c	Tue Aug 07 09:37:41 2007 +0100
> +++ b/drivers/pci/bus.c	Sun Oct 28 08:40:52 2007 -0400
> @@ -16,6 +16,8 @@
>  #include <linux/init.h>
>  
>  #include "pci.h"
> +
> +extern int pci_mem_align;

Please don't declare externs in a .c file, put it in a header file.

>  /**
>   * pci_bus_alloc_resource - allocate a resource from a parent bus
> @@ -43,6 +45,15 @@ pci_bus_alloc_resource(struct pci_bus *b
>  	int i, ret = -ENOMEM;
>  
>  	type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
> +
> +	/* if the boot parameter 'pci-mem-align' was specified, then we
need to 
> +		align the memory addresses, at page size alignment  */
> +	if (pci_mem_align)
> +	{
> +		/* we change only alignments which are smaller than page
size */
> +		if (align < (PAGE_SIZE-1))
> +			align = PAGE_SIZE - 1;
> +	}

Please follow the proper coding style rules as documented in
Documentation/CodingStyle.  Also running the script scrips/checkpatch.pl
would have pointed this out.

>  
>  	for (i = 0; i < PCI_BUS_NUM_RESOURCES; i++) {
>  		struct resource *r = bus->resource[i];
> diff -r 840b9df48b6a drivers/pci/quirks.c
> --- a/drivers/pci/quirks.c	Tue Aug 07 09:37:41 2007 +0100
> +++ b/drivers/pci/quirks.c	Sun Oct 28 08:40:52 2007 -0400
> @@ -22,6 +22,54 @@
>  #include <linux/delay.h>
>  #include <linux/acpi.h>
>  #include "pci.h"
> +
> +/* a global flag, which signals if we should align PCI mem windows to
4K */
> +int pci_mem_align = 0;
> +
> +/* this function is called, if the 'pci-mem-align' was specified as a
boot
> +		parameter. Used to force the kernel to align PCI mem
windows to 4k */

All command line paramaters need to be documented in the Documentation
directory.

> +static int __init set_pci_mem_align(char *str)
> +{
> +	pci_mem_align = 1;
> +	return 1;
> +}
> +__setup("pci-mem-align", set_pci_mem_align);
> +
> +/*	
> +	This quirk function enables us to force all memory resources
which are 
> +		assigned to PCI devices, to be aligned at a page size.
This 
> +		fetaurs helps us in xen when running native domains

Trailing spaces are not allowed :(

> +*/
> +static void __devinit quirk_xen_align_mem_resources(struct pci_dev
*dev)
> +{
> +	int i;
> +	struct resource *r;
> +
> +	/* if the boot parameter 'pci-mem-align' wasn't specified, then
no need
> +		to align the memory addresses */
> +	if (pci_mem_align == 0)
> +		return;
> +
> +	for (i=0; i < DEVICE_COUNT_RESOURCE; ++i) {
> +		r = &dev->resource[i];
> +
> +		if (r == NULL)
> +			continue;
> +
> +		/* we need to adjust only memory resources */
> +		if (r->flags & IORESOURCE_MEM) {
> +			/* align the start address in page size
alignment alignment (round up) */
> +			resource_size_t old_start = r->start;
> +			r->start = (r->start + (PAGE_SIZE-1))  &
(~(PAGE_SIZE-1));
> +
> +			/* update the end address, so that the region
size will not be 
> +				affected */
> +			r->end = r->end - (old_start - r->start);
> +		}
> +	}
> +}
> +DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID,
quirk_xen_align_mem_resources);

Now for the main reason.  I really don't understand why this is needed.
Is it only for virtual machines?  Are there drivers in the kernel tree
today that need this functionality?  What breaks if you do not align
these resources?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Align PCI memory regions to page size (4K) - Fix
  2007-10-28 19:53   ` Barak Fargoun
@ 2007-10-28 20:03     ` Greg KH
  2007-10-28 20:44       ` Barak Fargoun
                         ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Greg KH @ 2007-10-28 20:03 UTC (permalink / raw)
  To: Barak Fargoun; +Cc: linux-kernel, linux-pci, Guy Zana

On Sun, Oct 28, 2007 at 03:53:20PM -0400, Barak Fargoun wrote:
> Hi!
> 
> Regarding all the technical stuff (documentation, coding style, etc.) -
> I thought I did it correctly :( I will fix it ASAP, 
> and send an update when I will finish it.
> 
> About your question: today, some of the hypervisors are using linux
> kernel as their domain-0 (e.g. Xen). In order to implement direct
> hardware access for these native domains (e.g.  running windows in a
> virtual machine above Xen), the PCI memory regions should be aligned
> at-least at the page-level (so, a virtual machine - can't see data of
> other devices which may not be assigned to it). So, for that reason,
> we wanted a boot parameter to let us force the kernel to align PCI
> memory regions at-least at a PAGE_SIZE alignment. It is very useful
> for hypervisors which are developed at Linux environment (e.g.: Xen).

But doesn't aligning such regions on that alignment break some devices
as that is not what the device is asking for in the BIOS?

And if not, why would we not do this for all devices not just for
virtual machines, if it is such a benefit?

Also, how does this play with the hardware IOMMU chips that provide such
virtualization in hardware for you?

And, we can't accept a patch for 2.6.18, there is no development tree to
apply it to anymore, that is a dead kernel tree.  It needs to be against
2.6.24-rc1 at the latest to have a chance for approval.

Is this a patch that distros are shipping in their Xen versions?

And how does this play with KVM?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] Align PCI memory regions to page size (4K) - Fix
  2007-10-28 20:03     ` Greg KH
@ 2007-10-28 20:44       ` Barak Fargoun
  2007-10-29  1:08       ` David Miller
  2007-10-29  5:52       ` Grant Grundler
  2 siblings, 0 replies; 15+ messages in thread
From: Barak Fargoun @ 2007-10-28 20:44 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, linux-pci, Guy Zana

> From: Greg KH [mailto:gregkh@suse.de]
> Sent: Sunday, October 28, 2007 10:04 PM
> To: Barak Fargoun
> Cc: linux-kernel@vger.kernel.org;
> linux-pci@atrey.karlin.mff.cuni.cz; Guy Zana
> Subject: Re: [PATCH] Align PCI memory regions to page size (4K) - Fix
> 
> On Sun, Oct 28, 2007 at 03:53:20PM -0400, Barak Fargoun wrote:
> > Hi!
> > 
> > Regarding all the technical stuff (documentation, coding
> style, etc.)
> > - I thought I did it correctly :( I will fix it ASAP, and send an 
> > update when I will finish it.
> > 
> > About your question: today, some of the hypervisors are using linux 
> > kernel as their domain-0 (e.g. Xen). In order to implement direct 
> > hardware access for these native domains (e.g.  running
> windows in a
> > virtual machine above Xen), the PCI memory regions should
> be aligned
> > at-least at the page-level (so, a virtual machine - can't
> see data of
> > other devices which may not be assigned to it). So, for
> that reason,
> > we wanted a boot parameter to let us force the kernel to align PCI 
> > memory regions at-least at a PAGE_SIZE alignment. It is very useful 
> > for hypervisors which are developed at Linux environment
> (e.g.: Xen).
> 
> But doesn't aligning such regions on that alignment break some devices

> as that is not what the device is asking for in the BIOS?

No, it shouldn't break. If for example, a device request an alignment of
order 7, it will be provided if you will supply an alignment of larger
order (say 10 bits). An alignment of order that is bigger than 12 bits
is already page aligned, so we do not touch it in that case.

> 
> And if not, why would we not do this for all devices not just for 
> virtual machines, if it is such a benefit?

Since if a device asks for just 1K, the rest of the page (in mmio space)
will be empty if you'll page align it. The other resource will be in
another page and it consumes a separate page. It'll just consume
additional mmio space for nothing.

> 
> Also, how does this play with the hardware IOMMU chips that provide 
> such virtualization in hardware for you?

Actually we are not using an IOMMU, we use a 1:1 mapping that was
developed by Neocleus (for Xen right now), but that holds the same for
Intel VT-d as well.
IOMMUs are there to translate accesses to RAM from PCI devices, we are
more concerned about accesses by the host (cpu).
By using pci-mem-align, we assure that both the virtualized hardware &
the real hardware resources will be page aligned, this way we can remap
those pages so the HVM could safely access the hardware.

> 
> And, we can't accept a patch for 2.6.18, there is no development tree 
> to apply it to anymore, that is a dead kernel tree.  It needs to be 
> against
> 2.6.24-rc1 at the latest to have a chance for approval.
> 

Of course!
If you'll find it useful, we can make a progress and test it on the
latest rev and also do the changes that you asked.

> Is this a patch that distros are shipping in their Xen versions?

Actually, no one knows about it yet :-)

> 
> And how does this play with KVM?

Since we are working on Xen right now (dom0 is 2.6.18) and this feature
might be useful for other hypervisors that will implement pass-through
in the future (AFAIK KVM doesn't support PCI pass-through yet) we
thought to release it for Linux in general.

> 
> thanks,
> 
> greg k-h
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Align PCI memory regions to page size (4K) - Fix
  2007-10-28 20:03     ` Greg KH
  2007-10-28 20:44       ` Barak Fargoun
@ 2007-10-29  1:08       ` David Miller
  2007-11-13 21:17         ` Benjamin Herrenschmidt
  2007-10-29  5:52       ` Grant Grundler
  2 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2007-10-29  1:08 UTC (permalink / raw)
  To: gregkh; +Cc: barak, linux-kernel, linux-pci, guy

From: Greg KH <gregkh@suse.de>
Date: Sun, 28 Oct 2007 13:03:36 -0700

> But doesn't aligning such regions on that alignment break some devices
> as that is not what the device is asking for in the BIOS?

There are also platforms where bootup firmware chooses the
allocations, and that is what we use no matter what.  This is so that
when breaking into the firmware for debugging the firmware can still
access the console device where it had mapped it in the first place.

We really can't do things this way.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Align PCI memory regions to page size (4K) - Fix
  2007-10-28 20:03     ` Greg KH
  2007-10-28 20:44       ` Barak Fargoun
  2007-10-29  1:08       ` David Miller
@ 2007-10-29  5:52       ` Grant Grundler
  2007-11-08 23:24         ` Linas Vepstas
  2 siblings, 1 reply; 15+ messages in thread
From: Grant Grundler @ 2007-10-29  5:52 UTC (permalink / raw)
  To: Greg KH; +Cc: Barak Fargoun, linux-kernel, linux-pci, Guy Zana

On Sun, Oct 28, 2007 at 01:03:36PM -0700, Greg KH wrote:
> On Sun, Oct 28, 2007 at 03:53:20PM -0400, Barak Fargoun wrote:
...
> > About your question: today, some of the hypervisors are using linux
> > kernel as their domain-0 (e.g. Xen). In order to implement direct
> > hardware access for these native domains (e.g.  running windows in a
> > virtual machine above Xen), the PCI memory regions should be aligned
> > at-least at the page-level (so, a virtual machine - can't see data of
> > other devices which may not be assigned to it). So, for that reason,
> > we wanted a boot parameter to let us force the kernel to align PCI
> > memory regions at-least at a PAGE_SIZE alignment. It is very useful
> > for hypervisors which are developed at Linux environment (e.g.: Xen).
...
> And if not, why would we not do this for all devices not just for
> virtual machines, if it is such a benefit?

It's a benefit IFF multiple devices are spread across more than one guest
_and_ we don't trust every particating guest to play nicely with IO.  That way
the Hypervisor can assign one device to a specific guest OS for direct access.
E.g. 4 port Gige card could directly support the host and 3 guests with somewhat
lower risk of tromping on each other's MMIO space.

If Xen is cooperative, this seems a bit paranoid. I don't recall ever seeing a
driver bug where the driver accidentally poked MMIO space at the wrong device.
That's much more common for IO Port space. The only exception was xfree86 where it
poked around in random places to deal with buggy HW quirks.

The use of an IOMMU will provide much more useful protection.

grant

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Align PCI memory regions to page size (4K) - Fix
  2007-10-29  5:52       ` Grant Grundler
@ 2007-11-08 23:24         ` Linas Vepstas
  2007-11-12 23:43           ` Grant Grundler
  0 siblings, 1 reply; 15+ messages in thread
From: Linas Vepstas @ 2007-11-08 23:24 UTC (permalink / raw)
  To: Grant Grundler; +Cc: Greg KH, Barak Fargoun, linux-kernel, linux-pci, Guy Zana

On Sun, Oct 28, 2007 at 11:52:16PM -0600, Grant Grundler wrote:
> > On Sun, Oct 28, 2007 at 03:53:20PM -0400, Barak Fargoun wrote:
> ...
> > > About your question: today, some of the hypervisors are using linux
> > > kernel as their domain-0 (e.g. Xen). In order to implement direct
> > > hardware access for these native domains (e.g.  running windows in a
> > > virtual machine above Xen), the PCI memory regions should be aligned
> > > at-least at the page-level (so, a virtual machine - can't see data of
> > > other devices which may not be assigned to it). So, for that reason,
> > > we wanted a boot parameter to let us force the kernel to align PCI
> > > memory regions at-least at a PAGE_SIZE alignment. It is very useful
> > > for hypervisors which are developed at Linux environment (e.g.: Xen).
> 
> It's a benefit IFF multiple devices are spread across more than one guest
> _and_ we don't trust every particating guest to play nicely with IO.  That way
> the Hypervisor can assign one device to a specific guest OS for direct access.
> E.g. 4 port Gige card could directly support the host and 3 guests with somewhat
> lower risk of tromping on each other's MMIO space.
> 
> If Xen is cooperative, this seems a bit paranoid. I don't recall ever seeing a
> driver bug where the driver accidentally poked MMIO space at the wrong device.

I presume the issue is not a driver bug per-se, but a
spying/hacking-type security issue: Having root in one guest could in
principle allow one to write a driver that snooped on data in other
guests, and/or intentionally corrupted data on other guests.

I envision some ISP renting out 1/3 of a machine with a 4-port card,
and having some nosey college-kid wannabe hacker getting root on one of
the guests and causing trouble.  But perhaps I'm waaaayyyyy off base
here.

(Just like occasional cigarette smoking is known to inevitably lead to
full-fledged heroin addiction, I am pretty sure that the culture of
"cheat codes" among 12-year-olds is going to lead to an epidemic of
hackers in about 10 years. I am atuned to "wannabe hacker culture"). 

--linas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Align PCI memory regions to page size (4K) - Fix
  2007-11-08 23:24         ` Linas Vepstas
@ 2007-11-12 23:43           ` Grant Grundler
  0 siblings, 0 replies; 15+ messages in thread
From: Grant Grundler @ 2007-11-12 23:43 UTC (permalink / raw)
  To: Linas Vepstas
  Cc: Grant Grundler, Greg KH, Barak Fargoun, linux-kernel, linux-pci,
	Guy Zana

On Thu, Nov 08, 2007 at 05:24:00PM -0600, Linas Vepstas wrote:
...
> > E.g. 4 port Gige card could directly support the host and 3 guests with somewhat
> > lower risk of tromping on each other's MMIO space.
> > 
> > If Xen is cooperative, this seems a bit paranoid. I don't recall ever seeing a
> > driver bug where the driver accidentally poked MMIO space at the wrong device.
> 
> I presume the issue is not a driver bug per-se, but a
> spying/hacking-type security issue: Having root in one guest could in
> principle allow one to write a driver that snooped on data in other
> guests, and/or intentionally corrupted data on other guests.

If someone has root on a guest, they could modprobe a driver that
can map any unused virtual address to any physical address they want.
Unless the chipset somehow blocks/refuses to route IO for that guest,
then they can still poke at any other device once they figure out
where addresses are being routed (e.g. directly reading configuration
space or directly accessing chipset specific registers.)

> I envision some ISP renting out 1/3 of a machine with a 4-port card,
> and having some nosey college-kid wannabe hacker getting root on one of
> the guests and causing trouble.  But perhaps I'm waaaayyyyy off base
> here.

I agree this will make it slightly harder. Also makes it much more likely the
box will crash - taking down all the guests. And someone should notice that.

> (Just like occasional cigarette smoking is known to inevitably lead to
> full-fledged heroin addiction, I am pretty sure that the culture of
> "cheat codes" among 12-year-olds is going to lead to an epidemic of
> hackers in about 10 years. I am atuned to "wannabe hacker culture"). 

Ok - but I think there are more serious issues if someone can get
root on a remote box (ignore Virtualization). Several other possible
layers of security have already been "defeated" by then.

thanks,
grant

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Align PCI memory regions to page size (4K) - Fix
  2007-10-29  1:08       ` David Miller
@ 2007-11-13 21:17         ` Benjamin Herrenschmidt
  2007-11-14  6:21           ` Grant Grundler
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2007-11-13 21:17 UTC (permalink / raw)
  To: David Miller; +Cc: gregkh, barak, linux-kernel, linux-pci, guy


On Sun, 2007-10-28 at 18:08 -0700, David Miller wrote:
> From: Greg KH <gregkh@suse.de>
> Date: Sun, 28 Oct 2007 13:03:36 -0700
> 
> > But doesn't aligning such regions on that alignment break some devices
> > as that is not what the device is asking for in the BIOS?
> 
> There are also platforms where bootup firmware chooses the
> allocations, and that is what we use no matter what.  This is so that
> when breaking into the firmware for debugging the firmware can still
> access the console device where it had mapped it in the first place.
> 
> We really can't do things this way.

Agreed.

Though he's trying to fix a real issue, his patch is not the right
approach imho.

A better approach would be to have a mechanism to be triggered by the
hypervisor administration tools that will attempt to reassign -that-
specific device if it happens to share pages with another.

The remapping would thus only happen for that single device, after it's
been put in control of the hypervisor (no host driver is bound, maybe
just an HV specific "bridging" driver is), and only when the action of
assigning it to the partition is performed, so that if the machine
crashes as a result, at least you know why :-)

So something like your hypervisor binds a special driver to a device
that is to be reflected to a partition, at which point we are sure no
other driver is using it, then that driver can call something in the pci
layer that attempts to re-assign the device resources to keep it in a
separate page.

A patch implementing such a helper, and maybe reserving the rest of the
MMIO page via some dummy resource to make sure nobody else gets in, that
would make more sense.

Ben.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Align PCI memory regions to page size (4K) - Fix
  2007-11-13 21:17         ` Benjamin Herrenschmidt
@ 2007-11-14  6:21           ` Grant Grundler
  2007-11-14  8:16             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Grant Grundler @ 2007-11-14  6:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Miller, gregkh, barak, linux-kernel, linux-pci, guy

On Wed, Nov 14, 2007 at 08:17:33AM +1100, Benjamin Herrenschmidt wrote:
...
> Though he's trying to fix a real issue, his patch is not the right
> approach imho.
> 
> A better approach would be to have a mechanism to be triggered by the
> hypervisor administration tools that will attempt to reassign -that-
> specific device if it happens to share pages with another.
> 
> The remapping would thus only happen for that single device, after it's
> been put in control of the hypervisor (no host driver is bound, maybe
> just an HV specific "bridging" driver is), and only when the action of
> assigning it to the partition is performed, so that if the machine
> crashes as a result, at least you know why :-)
> 
> So something like your hypervisor binds a special driver to a device
> that is to be reflected to a partition, at which point we are sure no
> other driver is using it, then that driver can call something in the pci
> layer that attempts to re-assign the device resources to keep it in a
> separate page.

We already have that "something": pci_enable_device().
The guest OS "Arch" code can then do the reassignment.
See "3.1 Enable the PCI device" in Documentation/pci.txt.


> A patch implementing such a helper, and maybe reserving the rest of the
> MMIO page via some dummy resource to make sure nobody else gets in, that
> would make more sense.

The Hypervisor could be responsible for making the right devices
visible to the appropriate partitions/guests by intercepting the
PCI bus walk and/or hotplug support. I don't think you
should need any dummy resource/drivers in the guest OS.

hth,
grant

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Align PCI memory regions to page size (4K) - Fix
  2007-11-14  6:21           ` Grant Grundler
@ 2007-11-14  8:16             ` Benjamin Herrenschmidt
  2007-11-14 21:55               ` Grant Grundler
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2007-11-14  8:16 UTC (permalink / raw)
  To: Grant Grundler; +Cc: David Miller, gregkh, barak, linux-kernel, linux-pci, guy


On Tue, 2007-11-13 at 23:21 -0700, Grant Grundler wrote:
> 
> > So something like your hypervisor binds a special driver to a device
> > that is to be reflected to a partition, at which point we are sure
> no
> > other driver is using it, then that driver can call something in the
> pci
> > layer that attempts to re-assign the device resources to keep it in
> a
> > separate page.
> 
> We already have that "something": pci_enable_device().
> The guest OS "Arch" code can then do the reassignment.
> See "3.1 Enable the PCI device" in Documentation/pci.txt.

No, that can't be done there because that would mean the guest OS has
the ability to reassign resources of PCI devices which is really not
something you want. You really want the host OS to do that -before- it
makes the device visible to the guest imho.

> The Hypervisor could be responsible for making the right devices
> visible to the appropriate partitions/guests by intercepting the
> PCI bus walk and/or hotplug support. I don't think you
> should need any dummy resource/drivers in the guest OS.

I'm not talking about a dummy resource/driver in the guest OS, but
rather, in the host. When a PCI device, that is visible to the host, is
to be "reflected" into a guest, it makes sense to have a proxy driver
take over in the host from a resource management point of view, thus
making this device effectively "used" and thus preventing another
process or partition from trying to bind to it. That proxy driver can
then be responsible for doing the appropriate resource tweaking before
making the device effectively visible to the guest. It might make sense
to provide a helper in the PCI layer to make that easier tho.

Ben.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Align PCI memory regions to page size (4K) - Fix
  2007-11-14  8:16             ` Benjamin Herrenschmidt
@ 2007-11-14 21:55               ` Grant Grundler
  2007-11-14 22:16                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Grant Grundler @ 2007-11-14 21:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Grant Grundler, David Miller, gregkh, barak, linux-kernel,
	linux-pci, guy

On Wed, Nov 14, 2007 at 07:16:18PM +1100, Benjamin Herrenschmidt wrote:
> > We already have that "something": pci_enable_device().
> > The guest OS "Arch" code can then do the reassignment.
> > See "3.1 Enable the PCI device" in Documentation/pci.txt.
> 
> No, that can't be done there because that would mean the guest OS has
> the ability to reassign resources of PCI devices which is really not
> something you want. You really want the host OS to do that -before- it
> makes the device visible to the guest imho.

Agreed. I didn't explain well enough. The assignment doesn't need
to happen until that point. My assumption was the host OS (or Hypervisor)
would already know what it's supposed to be before then and just
set it up at that point.

> 
> > The Hypervisor could be responsible for making the right devices
> > visible to the appropriate partitions/guests by intercepting the
> > PCI bus walk and/or hotplug support. I don't think you
> > should need any dummy resource/drivers in the guest OS.
> 
> I'm not talking about a dummy resource/driver in the guest OS, but
> rather, in the host. When a PCI device, that is visible to the host, is
> to be "reflected" into a guest, it makes sense to have a proxy driver
> take over in the host from a resource management point of view, thus
> making this device effectively "used" and thus preventing another
> process or partition from trying to bind to it. That proxy driver can
> then be responsible for doing the appropriate resource tweaking before
> making the device effectively visible to the guest. It might make sense
> to provide a helper in the PCI layer to make that easier tho.

Ah ok.. I was assuming there was only a "Hypervisor" and all the "guests"
were equal. If one OS instance is "Host" and can see the device before hand,
then yeah, it makes sense to "hide" the device from the normal device drivers.

grant
"Host OS
> 
> Ben.
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Align PCI memory regions to page size (4K) - Fix
  2007-11-14 21:55               ` Grant Grundler
@ 2007-11-14 22:16                 ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2007-11-14 22:16 UTC (permalink / raw)
  To: Grant Grundler; +Cc: David Miller, gregkh, barak, linux-kernel, linux-pci, guy


On Wed, 2007-11-14 at 14:55 -0700, Grant Grundler wrote:

> Ah ok.. I was assuming there was only a "Hypervisor" and all the "guests"
> were equal. If one OS instance is "Host" and can see the device before hand,
> then yeah, it makes sense to "hide" the device from the normal device drivers.

The "Host" is the hypervisor in something like KVM or lguest which is
all I really care about :-) On Xen, it would be dom0 I suppose.

Ben.



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-11-14 22:17 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-28 17:27 [PATCH] Align PCI memory regions to page size (4K) - Fix Barak Fargoun
2007-10-28 19:31 ` Greg KH
2007-10-28 19:53   ` Barak Fargoun
2007-10-28 20:03     ` Greg KH
2007-10-28 20:44       ` Barak Fargoun
2007-10-29  1:08       ` David Miller
2007-11-13 21:17         ` Benjamin Herrenschmidt
2007-11-14  6:21           ` Grant Grundler
2007-11-14  8:16             ` Benjamin Herrenschmidt
2007-11-14 21:55               ` Grant Grundler
2007-11-14 22:16                 ` Benjamin Herrenschmidt
2007-10-29  5:52       ` Grant Grundler
2007-11-08 23:24         ` Linas Vepstas
2007-11-12 23:43           ` Grant Grundler
2007-10-28 19:48 ` Arjan van de Ven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).