LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
@ 2007-08-03 22:10 Chuck Ebbert
  2007-08-03 22:50 ` Andrew Morton
  0 siblings, 1 reply; 35+ messages in thread
From: Chuck Ebbert @ 2007-08-03 22:10 UTC (permalink / raw)
  To: Muli Ben-Yehuda; +Cc: linux-kernel, riku.seppala

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=250859

at line 74:

muli@62829: 
muli@62829: 					sd = bus->sysdata;
muli@62829: 					sd->node = node;   <=====

bus->sysdata is NULL.

Last changed by this hunk of
"x86-64: introduce struct pci_sysdata to facilitate sharing of ->sysdata":

@@ -67,7 +69,9 @@ fill_mp_bus_to_cpumask(void)
 						continue;
 					if (!node_online(node))
 						node = 0;
-					bus->sysdata = (void *)node;
+
+					sd = bus->sysdata;
+					sd->node = node;
 				}		
 			}
 		}


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-03 22:10 Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask() Chuck Ebbert
@ 2007-08-03 22:50 ` Andrew Morton
  2007-08-04  6:17   ` Muli Ben-Yehuda
  2007-08-04  9:30   ` Andi Kleen
  0 siblings, 2 replies; 35+ messages in thread
From: Andrew Morton @ 2007-08-03 22:50 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: Muli Ben-Yehuda, linux-kernel, riku.seppala, Andy Whitcroft, Andi Kleen

On Fri, 03 Aug 2007 18:10:03 -0400
Chuck Ebbert <cebbert@redhat.com> wrote:

> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=250859
> 
> at line 74:
> 
> muli@62829: 
> muli@62829: 					sd = bus->sysdata;
> muli@62829: 					sd->node = node;   <=====
> 
> bus->sysdata is NULL.
> 
> Last changed by this hunk of
> "x86-64: introduce struct pci_sysdata to facilitate sharing of ->sysdata":
> 
> @@ -67,7 +69,9 @@ fill_mp_bus_to_cpumask(void)
>  						continue;
>  					if (!node_online(node))
>  						node = 0;
> -					bus->sysdata = (void *)node;
> +
> +					sd = bus->sysdata;
> +					sd->node = node;
>  				}		
>  			}
>  		}
> 

Andy keeps trotting out a patch which will probably fix this, but for some
reason it doesn't seem to make progress.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-03 22:50 ` Andrew Morton
@ 2007-08-04  6:17   ` Muli Ben-Yehuda
  2007-08-04  9:30   ` Andi Kleen
  1 sibling, 0 replies; 35+ messages in thread
From: Muli Ben-Yehuda @ 2007-08-04  6:17 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Chuck Ebbert, linux-kernel, riku.seppala, Andi Kleen,
	Andrew Morton, Jeff Garzik

On Fri, Aug 03, 2007 at 03:50:35PM -0700, Andrew Morton wrote:
> On Fri, 03 Aug 2007 18:10:03 -0400
> Chuck Ebbert <cebbert@redhat.com> wrote:
> 
> > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=250859
> > 
> > at line 74:
> > 
> > muli@62829: 
> > muli@62829: 					sd = bus->sysdata;
> > muli@62829: 					sd->node = node;   <=====
> > 
> > bus->sysdata is NULL.
> > 
> > Last changed by this hunk of
> > "x86-64: introduce struct pci_sysdata to facilitate sharing of ->sysdata":
> > 
> > @@ -67,7 +69,9 @@ fill_mp_bus_to_cpumask(void)
> >  						continue;
> >  					if (!node_online(node))
> >  						node = 0;
> > -					bus->sysdata = (void *)node;
> > +
> > +					sd = bus->sysdata;
> > +					sd->node = node;
> >  				}		
> >  			}
> >  		}
> > 
> 
> Andy keeps trotting out a patch which will probably fix this, but
> for some reason it doesn't seem to make progress.

Hmm, looks like we're missing one of the paths where sd gets
allocated.

Andy can you please point me to the latest version of your patch? I
recall thinking that it papers over the bug rather than fixing it but
would like to take a second look.

Thanks,
Muli


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-03 22:50 ` Andrew Morton
  2007-08-04  6:17   ` Muli Ben-Yehuda
@ 2007-08-04  9:30   ` Andi Kleen
  2007-08-04 16:32     ` Andrew Morton
  1 sibling, 1 reply; 35+ messages in thread
From: Andi Kleen @ 2007-08-04  9:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Chuck Ebbert, Muli Ben-Yehuda, linux-kernel, riku.seppala,
	Andy Whitcroft

On Saturday 04 August 2007 00:50, Andrew Morton wrote:
> On Fri, 03 Aug 2007 18:10:03 -0400
>
> Chuck Ebbert <cebbert@redhat.com> wrote:
> > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=250859
> >
> > at line 74:
> >
> > muli@62829:
> > muli@62829: 					sd = bus->sysdata;
> > muli@62829: 					sd->node = node;   <=====
> >
> > bus->sysdata is NULL.
> >
> > Last changed by this hunk of
> > "x86-64: introduce struct pci_sysdata to facilitate sharing of
> > ->sysdata":

Hmm, will double check. Perhaps Muli's conversion was incomplete.

> > @@ -67,7 +69,9 @@ fill_mp_bus_to_cpumask(void)
> >  						continue;
> >  					if (!node_online(node))
> >  						node = 0;
> > -					bus->sysdata = (void *)node;
> > +
> > +					sd = bus->sysdata;
> > +					sd->node = node;
> >  				}
> >  			}
> >  		}
>
> Andy keeps trotting out a patch which will probably fix this,

What patch do you mean? I don't have anything sysdata related
left over.

-Andi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-04  9:30   ` Andi Kleen
@ 2007-08-04 16:32     ` Andrew Morton
  2007-08-04 17:45       ` Yinghai Lu
  2007-08-04 23:40       ` Andi Kleen
  0 siblings, 2 replies; 35+ messages in thread
From: Andrew Morton @ 2007-08-04 16:32 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Chuck Ebbert, Muli Ben-Yehuda, linux-kernel, riku.seppala,
	Andy Whitcroft

On Sat, 4 Aug 2007 11:30:41 +0200 Andi Kleen <ak@suse.de> wrote:

> On Saturday 04 August 2007 00:50, Andrew Morton wrote:
> > On Fri, 03 Aug 2007 18:10:03 -0400
> >
> > Chuck Ebbert <cebbert@redhat.com> wrote:
> > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=250859
> > >
> > > at line 74:
> > >
> > > muli@62829:
> > > muli@62829: 					sd = bus->sysdata;
> > > muli@62829: 					sd->node = node;   <=====
> > >
> > > bus->sysdata is NULL.
> > >
> > > Last changed by this hunk of
> > > "x86-64: introduce struct pci_sysdata to facilitate sharing of
> > > ->sysdata":
> 
> Hmm, will double check. Perhaps Muli's conversion was incomplete.

hm.

> > > @@ -67,7 +69,9 @@ fill_mp_bus_to_cpumask(void)
> > >  						continue;
> > >  					if (!node_online(node))
> > >  						node = 0;
> > > -					bus->sysdata = (void *)node;
> > > +
> > > +					sd = bus->sysdata;
> > > +					sd->node = node;
> > >  				}
> > >  			}
> > >  		}
> >
> > Andy keeps trotting out a patch which will probably fix this,
> 
> What patch do you mean? I don't have anything sysdata related
> left over.
> 

"pci device ensure sysdata initialised", now at version 4.



From: Andy Whitcroft <apw@shadowen.org>

We have been seeing panic's on NUMA systems in pci_call_probe() in
2.6.19-rc1-mm1 and later.  This is related to the changes introduced in the
commit below:

    [x86, PCI] Switch pci_bus::sysdata from NUMA node integer to a pointer
    0a247a58fc3e2ecfc17654301033e8b8d08df2a2

In this change the sysdata has changed from directly representing a value
(the node number in NUMA) to a pointer to a structure.  However, it seems
that we do not always initialise this sysdata before we probe the device.

Prior to the changes above the node was defaulted to 'NULL' allocating the
devices to node 0 unconditionally.  This patch adds a default sysdata entry
(pci_default_sysdata), this is then used where 'NULL' was used previously. 
pci_default_sysdata defaults the node to unknown (-1).  This is a more
accurate assignment, mirroring the value returned where no topology support
is provided and no locality information is available.

There are only two uses of this value in the affected architectures
(x86, x86_64) and generic code:

1) in x86_64, dma_alloc_pages() looks up the node in order to
   allocate node local memory.  Here if the node is invalid we
   will default to the first online node.  Behaviour here should
   be unchanged.
2) in generic, pci_call_probe() looks up the node in order to
   restrict execution of the probe on the card local node, to
   favor node local allocation.  Where this is unknown previously
   we would force execution (and thereby allocation) to node 0,
   this is arguably wrong and using -1 releases this restriction.

In an ideal world we should be supplying a sysdata for the
appropriate node where it is known.  Where it is not known defaulting
to -1 seems a better course, and would help us where node 0 is
short of memory.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/i386/pci/common.c   |    2 ++
 arch/i386/pci/fixup.c    |    8 +++++---
 arch/i386/pci/numa.c     |    8 +++++---
 arch/i386/pci/visws.c    |    4 ++--
 include/asm-i386/pci.h   |    1 +
 include/asm-x86_64/pci.h |    1 +
 6 files changed, 16 insertions(+), 8 deletions(-)

diff -puN arch/i386/pci/common.c~pci-device-ensure-sysdata-initialised-v4 arch/i386/pci/common.c
--- a/arch/i386/pci/common.c~pci-device-ensure-sysdata-initialised-v4
+++ a/arch/i386/pci/common.c
@@ -27,6 +27,8 @@ unsigned long pirq_table_addr;
 struct pci_bus *pci_root_bus;
 struct pci_raw_ops *raw_pci_ops;
 
+struct pci_sysdata pci_default_sysdata = { .node = -1 };
+
 static int pci_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
 {
 	return raw_pci_ops->read(0, bus->number, devfn, where, size, value);
diff -puN arch/i386/pci/fixup.c~pci-device-ensure-sysdata-initialised-v4 arch/i386/pci/fixup.c
--- a/arch/i386/pci/fixup.c~pci-device-ensure-sysdata-initialised-v4
+++ a/arch/i386/pci/fixup.c
@@ -25,9 +25,11 @@ static void __devinit pci_fixup_i450nx(s
 		pci_read_config_byte(d, reg++, &subb);
 		DBG("i450NX PXB %d: %02x/%02x/%02x\n", pxb, busno, suba, subb);
 		if (busno)
-			pci_scan_bus(busno, &pci_root_ops, NULL);	/* Bus A */
+			pci_scan_bus(busno, &pci_root_ops,
+					&pci_default_sysdata);	/* Bus A */
 		if (suba < subb)
-			pci_scan_bus(suba+1, &pci_root_ops, NULL);	/* Bus B */
+			pci_scan_bus(suba+1, &pci_root_ops,
+					&pci_default_sysdata);	/* Bus B */
 	}
 	pcibios_last_bus = -1;
 }
@@ -42,7 +44,7 @@ static void __devinit pci_fixup_i450gx(s
 	u8 busno;
 	pci_read_config_byte(d, 0x4a, &busno);
 	printk(KERN_INFO "PCI: i440KX/GX host bridge %s: secondary bus %02x\n", pci_name(d), busno);
-	pci_scan_bus(busno, &pci_root_ops, NULL);
+	pci_scan_bus(busno, &pci_root_ops, &pci_default_sysdata);
 	pcibios_last_bus = -1;
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82454GX, pci_fixup_i450gx);
diff -puN arch/i386/pci/numa.c~pci-device-ensure-sysdata-initialised-v4 arch/i386/pci/numa.c
--- a/arch/i386/pci/numa.c~pci-device-ensure-sysdata-initialised-v4
+++ a/arch/i386/pci/numa.c
@@ -97,9 +97,11 @@ static void __devinit pci_fixup_i450nx(s
 		pci_read_config_byte(d, reg++, &subb);
 		DBG("i450NX PXB %d: %02x/%02x/%02x\n", pxb, busno, suba, subb);
 		if (busno)
-			pci_scan_bus(QUADLOCAL2BUS(quad,busno), &pci_root_ops, NULL);	/* Bus A */
+			pci_scan_bus(QUADLOCAL2BUS(quad,busno), &pci_root_ops,
+					&pci_default_sysdata);	/* Bus A */
 		if (suba < subb)
-			pci_scan_bus(QUADLOCAL2BUS(quad,suba+1), &pci_root_ops, NULL);	/* Bus B */
+			pci_scan_bus(QUADLOCAL2BUS(quad,suba+1), &pci_root_ops,
+					&pci_default_sysdata);	/* Bus B */
 	}
 	pcibios_last_bus = -1;
 }
@@ -124,7 +126,7 @@ static int __init pci_numa_init(void)
 			printk("Scanning PCI bus %d for quad %d\n", 
 				QUADLOCAL2BUS(quad,0), quad);
 			pci_scan_bus(QUADLOCAL2BUS(quad,0), 
-				&pci_root_ops, NULL);
+				&pci_root_ops, &pci_default_sysdata);
 		}
 	return 0;
 }
diff -puN arch/i386/pci/visws.c~pci-device-ensure-sysdata-initialised-v4 arch/i386/pci/visws.c
--- a/arch/i386/pci/visws.c~pci-device-ensure-sysdata-initialised-v4
+++ a/arch/i386/pci/visws.c
@@ -101,8 +101,8 @@ static int __init pcibios_init(void)
 		"bridge B (PIIX4) bus: %u\n", pci_bus1, pci_bus0);
 
 	raw_pci_ops = &pci_direct_conf1;
-	pci_scan_bus(pci_bus0, &pci_root_ops, NULL);
-	pci_scan_bus(pci_bus1, &pci_root_ops, NULL);
+	pci_scan_bus(pci_bus0, &pci_root_ops, &pci_default_sysdata);
+	pci_scan_bus(pci_bus1, &pci_root_ops, &pci_default_sysdata);
 	pci_fixup_irqs(visws_swizzle, visws_map_irq);
 	pcibios_resource_survey();
 	return 0;
diff -puN include/asm-i386/pci.h~pci-device-ensure-sysdata-initialised-v4 include/asm-i386/pci.h
--- a/include/asm-i386/pci.h~pci-device-ensure-sysdata-initialised-v4
+++ a/include/asm-i386/pci.h
@@ -7,6 +7,7 @@
 struct pci_sysdata {
 	int		node;		/* NUMA node */
 };
+extern struct pci_sysdata pci_default_sysdata;
 
 #include <linux/mm.h>		/* for struct page */
 
diff -puN include/asm-x86_64/pci.h~pci-device-ensure-sysdata-initialised-v4 include/asm-x86_64/pci.h
--- a/include/asm-x86_64/pci.h~pci-device-ensure-sysdata-initialised-v4
+++ a/include/asm-x86_64/pci.h
@@ -9,6 +9,7 @@ struct pci_sysdata {
 	int		node;		/* NUMA node */
 	void*		iommu;		/* IOMMU private data */
 };
+extern struct pci_sysdata pci_default_sysdata;
 
 #ifdef CONFIG_CALGARY_IOMMU
 static inline void* pci_iommu(struct pci_bus *bus)
_


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-04 16:32     ` Andrew Morton
@ 2007-08-04 17:45       ` Yinghai Lu
  2007-08-04 18:15         ` Andrew Morton
  2007-08-04 23:40       ` Andi Kleen
  1 sibling, 1 reply; 35+ messages in thread
From: Yinghai Lu @ 2007-08-04 17:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andi Kleen, Chuck Ebbert, Muli Ben-Yehuda, linux-kernel,
	riku.seppala, Andy Whitcroft

On 8/4/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Sat, 4 Aug 2007 11:30:41 +0200 Andi Kleen <ak@suse.de> wrote:
>
> > On Saturday 04 August 2007 00:50, Andrew Morton wrote:
> > > On Fri, 03 Aug 2007 18:10:03 -0400
> > >
> > > Chuck Ebbert <cebbert@redhat.com> wrote:
> > > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=250859
> > > >
> > > > at line 74:
> > > >
> > > > muli@62829:
> > > > muli@62829:                                       sd = bus->sysdata;
> > > > muli@62829:                                       sd->node = node;   <=====
> > > >
> > > > bus->sysdata is NULL.
> > > >
> > > > Last changed by this hunk of
> > > > "x86-64: introduce struct pci_sysdata to facilitate sharing of
> > > > ->sysdata":
> >
> > Hmm, will double check. Perhaps Muli's conversion was incomplete.
>
> hm.
>
> > > > @@ -67,7 +69,9 @@ fill_mp_bus_to_cpumask(void)
> > > >                                           continue;
> > > >                                   if (!node_online(node))
> > > >                                           node = 0;
> > > > -                                 bus->sysdata = (void *)node;
> > > > +
> > > > +                                 sd = bus->sysdata;
> > > > +                                 sd->node = node;
> > > >                           }
> > > >                   }
> > > >           }
> > >
> > > Andy keeps trotting out a patch which will probably fix this,
> >
> > What patch do you mean? I don't have anything sysdata related
> > left over.
> >
>
> "pci device ensure sysdata initialised", now at version 4.
>
>
>
> From: Andy Whitcroft <apw@shadowen.org>
>
> We have been seeing panic's on NUMA systems in pci_call_probe() in
> 2.6.19-rc1-mm1 and later.  This is related to the changes introduced in the
> commit below:
>
>     [x86, PCI] Switch pci_bus::sysdata from NUMA node integer to a pointer
>     0a247a58fc3e2ecfc17654301033e8b8d08df2a2
>
> In this change the sysdata has changed from directly representing a value
> (the node number in NUMA) to a pointer to a structure.  However, it seems
> that we do not always initialise this sysdata before we probe the device.
>
> Prior to the changes above the node was defaulted to 'NULL' allocating the
> devices to node 0 unconditionally.  This patch adds a default sysdata entry
> (pci_default_sysdata), this is then used where 'NULL' was used previously.
> pci_default_sysdata defaults the node to unknown (-1).  This is a more
> accurate assignment, mirroring the value returned where no topology support
> is provided and no locality information is available.
>
> There are only two uses of this value in the affected architectures
> (x86, x86_64) and generic code:
>
> 1) in x86_64, dma_alloc_pages() looks up the node in order to
>    allocate node local memory.  Here if the node is invalid we
>    will default to the first online node.  Behaviour here should
>    be unchanged.
> 2) in generic, pci_call_probe() looks up the node in order to
>    restrict execution of the probe on the card local node, to
>    favor node local allocation.  Where this is unknown previously
>    we would force execution (and thereby allocation) to node 0,
>    this is arguably wrong and using -1 releases this restriction.
>
> In an ideal world we should be supplying a sysdata for the
> appropriate node where it is known.  Where it is not known defaulting
> to -1 seems a better course, and would help us where node 0 is
> short of memory.
>
> Signed-off-by: Andy Whitcroft <apw@shadowen.org>
> Acked-by: Yinghai Lu <yinghai.lu@sun.com>
> Cc: Andi Kleen <ak@suse.de>
> Cc: Jeff Garzik <jeff@garzik.org>
> Cc: Greg KH <greg@kroah.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Andrew,

still need
x86_64-get-mp_bus_to_node-as-early-v2.patch in the -mm
it fix
diff -puN arch/i386/pci/irq.c~x86_64-get-mp_bus_to_node-as-early-v2
arch/i386/pci/irq.c
--- a/arch/i386/pci/irq.c~x86_64-get-mp_bus_to_node-as-early-v2
+++ a/arch/i386/pci/irq.c
@@ -136,10 +136,26 @@ static void __init pirq_peer_trick(void)
               busmap[e->bus] = 1;
       }
       for(i = 1; i < 256; i++) {
+               struct pci_bus *bus = NULL;
+               struct pci_sysdata *sd;
               if (!busmap[i] || pci_find_bus(0, i))
                       continue;
-               if (pci_scan_bus(i, &pci_root_ops, NULL))
+               /* Allocate per-root-bus (not per bus) arch-specific data.
+                * TODO: leak; this memory is never freed.
+                * It's arguable whether it's worth the trouble to care.
+                */
+               sd = kzalloc(sizeof(*sd), GFP_KERNEL);
+               if (!sd) {
+                       printk(KERN_ERR "PCI: OOM, not probing PCI bus %02x\n",
+                               i);
+                       continue;
+               }
+               sd->node = get_mp_bus_to_node(i);
+               bus = pci_scan_bus(i, &pci_root_ops, sd);
+               if (bus)
                       printk(KERN_INFO "PCI: Discovered primary peer
bus %02x [IRQ]\n", i);
+               else
+                       kfree(sd);
       }
       pcibios_last_bus = -1;
 }
diff -puN arch/i386/pci/legacy.c~x86_64-get-mp_bus_to_node-as-early-v2
arch/i386/pci/legacy.c
--- a/arch/i386/pci/legacy.c~x86_64-get-mp_bus_to_node-as-early-v2
+++ a/arch/i386/pci/legacy.c
@@ -12,6 +12,9 @@
 static void __devinit pcibios_fixup_peer_bridges(void)
 {
       int n, devfn;
+       struct pci_bus *bus = NULL;
+       struct pci_sysdata *sd;
+       long node;

       if (pcibios_last_bus <= 0 || pcibios_last_bus >= 0xff)
               return;
@@ -21,12 +24,29 @@ static void __devinit pcibios_fixup_peer
               u32 l;
               if (pci_find_bus(0, n))
                       continue;
+               node = get_mp_bus_to_node(n);
               for (devfn = 0; devfn < 256; devfn += 8) {
                       if (!raw_pci_ops->read(0, n, devfn,
PCI_VENDOR_ID, 2, &l) &&
                           l != 0x0000 && l != 0xffff) {
                               DBG("Found device at %02x:%02x
[%04x]\n", n, devfn, l);
                               printk(KERN_INFO "PCI: Discovered peer
bus %02x\n", n);
-                               pci_scan_bus(n, &pci_root_ops, NULL);
+                               /* Allocate per-root-bus (not per bus)
+                                * arch-specific data.
+                                * TODO: leak; this memory is never freed.
+                                * It's arguable whether it's worth the trouble
+                                * to care.
+                                */
+                               sd = kzalloc(sizeof(*sd), GFP_KERNEL);
+                               if (!sd) {
+                                       printk(KERN_ERR
+                                       "PCI: OOM, not probing PCI bus %02x\n",
+                                                n);
+                                       break;
+                               }
+                               sd->node = node;
+                               bus = pci_scan_bus(n, &pci_root_ops, sd);
+                               if (!bus)
+                                       kfree(sd);
                               break;
                       }
               }

YH

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-04 17:45       ` Yinghai Lu
@ 2007-08-04 18:15         ` Andrew Morton
  2007-08-04 19:02           ` Yinghai Lu
  0 siblings, 1 reply; 35+ messages in thread
From: Andrew Morton @ 2007-08-04 18:15 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andi Kleen, Chuck Ebbert, Muli Ben-Yehuda, linux-kernel,
	riku.seppala, Andy Whitcroft

On Sat, 4 Aug 2007 10:45:31 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:

> Andrew,
> 
> still need
> x86_64-get-mp_bus_to_node-as-early-v2.patch in the -mm
> it fix

What does it fix?  Much more detail, please.

> diff -puN arch/i386/pci/irq.c~x86_64-get-mp_bus_to_node-as-early-v2
> arch/i386/pci/irq.c
> --- a/arch/i386/pci/irq.c~x86_64-get-mp_bus_to_node-as-early-v2
> +++ a/arch/i386/pci/irq.c
> @@ -136,10 +136,26 @@ static void __init pirq_peer_trick(void)
>                busmap[e->bus] = 1;
>        }
>        for(i = 1; i < 256; i++) {
> +               struct pci_bus *bus = NULL;
> +               struct pci_sysdata *sd;
>                if (!busmap[i] || pci_find_bus(0, i))
>                        continue;
> -               if (pci_scan_bus(i, &pci_root_ops, NULL))
> +               /* Allocate per-root-bus (not per bus) arch-specific data.
> +                * TODO: leak; this memory is never freed.
> +                * It's arguable whether it's worth the trouble to care.
> +                */
> +               sd = kzalloc(sizeof(*sd), GFP_KERNEL);
> +               if (!sd) {
> +                       printk(KERN_ERR "PCI: OOM, not probing PCI bus %02x\n",
> +                               i);
> +                       continue;
> +               }
> +               sd->node = get_mp_bus_to_node(i);
> +               bus = pci_scan_bus(i, &pci_root_ops, sd);
> +               if (bus)
>                        printk(KERN_INFO "PCI: Discovered primary peer
> bus %02x [IRQ]\n", i);

Wordwrapped, tabs replaced with spaces.

Please, for once and for all, fix your email client?

Thanks.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-04 18:15         ` Andrew Morton
@ 2007-08-04 19:02           ` Yinghai Lu
  2007-08-05  5:52             ` Andrew Morton
  0 siblings, 1 reply; 35+ messages in thread
From: Yinghai Lu @ 2007-08-04 19:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andi Kleen, Chuck Ebbert, Muli Ben-Yehuda, linux-kernel,
	riku.seppala, Andy Whitcroft

On 8/4/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Sat, 4 Aug 2007 10:45:31 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
>
> > Andrew,
> >
> > still need
x86_64-get-mp_bus_to_node-as-early-v2.patch is already in the -mm

it fixs
pirq_peer_trick(void) in irq.c
pcibios_fixup_peer_bridges in legacy.c

by allocate sd.

YH

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-04 16:32     ` Andrew Morton
  2007-08-04 17:45       ` Yinghai Lu
@ 2007-08-04 23:40       ` Andi Kleen
  2007-08-05  4:15         ` Muli Ben-Yehuda
                           ` (2 more replies)
  1 sibling, 3 replies; 35+ messages in thread
From: Andi Kleen @ 2007-08-04 23:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Chuck Ebbert, Muli Ben-Yehuda, linux-kernel, riku.seppala,
	Andy Whitcroft

On Saturday 04 August 2007 18:32:22 Andrew Morton wrote:
> On Sat, 4 Aug 2007 11:30:41 +0200 Andi Kleen <ak@suse.de> wrote:
> 
> > On Saturday 04 August 2007 00:50, Andrew Morton wrote:
> > > On Fri, 03 Aug 2007 18:10:03 -0400
> > >
> > > Chuck Ebbert <cebbert@redhat.com> wrote:
> > > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=250859
> > > >
> > > > at line 74:
> > > >
> > > > muli@62829:
> > > > muli@62829: 					sd = bus->sysdata;
> > > > muli@62829: 					sd->node = node;   <=====
> > > >
> > > > bus->sysdata is NULL.
> > > >
> > > > Last changed by this hunk of
> > > > "x86-64: introduce struct pci_sysdata to facilitate sharing of
> > > > ->sysdata":
> > 
> > Hmm, will double check. Perhaps Muli's conversion was incomplete.
> 
> hm.
> 
> > > > @@ -67,7 +69,9 @@ fill_mp_bus_to_cpumask(void)
> > > >  						continue;
> > > >  					if (!node_online(node))
> > > >  						node = 0;
> > > > -					bus->sysdata = (void *)node;
> > > > +
> > > > +					sd = bus->sysdata;
> > > > +					sd->node = node;
> > > >  				}
> > > >  			}
> > > >  		}
> > >
> > > Andy keeps trotting out a patch which will probably fix this,
> > 
> > What patch do you mean? I don't have anything sysdata related
> > left over.
> > 
> 
> "pci device ensure sysdata initialised", now at version 4.

Oh what a mess. I think I'll ask Linus to revert the sysdata patch
instead. Clearly the stuff is half-baked

-Andi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-04 23:40       ` Andi Kleen
@ 2007-08-05  4:15         ` Muli Ben-Yehuda
  2007-08-05  4:33           ` Yinghai Lu
  2007-08-05  4:31         ` Yinghai Lu
  2007-08-05  5:04         ` Muli Ben-Yehuda
  2 siblings, 1 reply; 35+ messages in thread
From: Muli Ben-Yehuda @ 2007-08-05  4:15 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Chuck Ebbert, linux-kernel, riku.seppala,
	Andy Whitcroft, Jeff Garzik

On Sun, Aug 05, 2007 at 01:40:49AM +0200, Andi Kleen wrote:

> > "pci device ensure sysdata initialised", now at version 4.
> 
> Oh what a mess. I think I'll ask Linus to revert the sysdata patch
> instead. Clearly the stuff is half-baked

Personally I don't care that much, but I think you'll make jgarzik
very unhappy. The idea is also clearly the right way forward.

So far we know of a single regression and only with 'pci=noacpi'. How
about giving us a little bit of time to get to the bottom of it?
Unfortunately it's not reproducable on any of my machines, but I'm
looking into it.

Cheers,
Muli


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-04 23:40       ` Andi Kleen
  2007-08-05  4:15         ` Muli Ben-Yehuda
@ 2007-08-05  4:31         ` Yinghai Lu
  2007-08-05  5:04         ` Muli Ben-Yehuda
  2 siblings, 0 replies; 35+ messages in thread
From: Yinghai Lu @ 2007-08-05  4:31 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Chuck Ebbert, Muli Ben-Yehuda, linux-kernel,
	riku.seppala, Andy Whitcroft

On 8/4/07, Andi Kleen <ak@suse.de> wrote:
> On Saturday 04 August 2007 18:32:22 Andrew Morton wrote:
> > On Sat, 4 Aug 2007 11:30:41 +0200 Andi Kleen <ak@suse.de> wrote:
> >
> > > On Saturday 04 August 2007 00:50, Andrew Morton wrote:
> > > > On Fri, 03 Aug 2007 18:10:03 -0400
> > > >
> > > > Chuck Ebbert <cebbert@redhat.com> wrote:
> > > > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=250859
> > > > >
> > > > > at line 74:
> > > > >
> > > > > muli@62829:
> > > > > muli@62829:                                     sd = bus->sysdata;
> > > > > muli@62829:                                     sd->node = node;   <=====
> > > > >
> > > > > bus->sysdata is NULL.
> > > > >
> > > > > Last changed by this hunk of
> > > > > "x86-64: introduce struct pci_sysdata to facilitate sharing of
> > > > > ->sysdata":
> > >
> > > Hmm, will double check. Perhaps Muli's conversion was incomplete.
> >
> > hm.
> >
> > > > > @@ -67,7 +69,9 @@ fill_mp_bus_to_cpumask(void)
> > > > >                                                 continue;
> > > > >                                         if (!node_online(node))
> > > > >                                                 node = 0;
> > > > > -                                       bus->sysdata = (void *)node;
> > > > > +
> > > > > +                                       sd = bus->sysdata;
> > > > > +                                       sd->node = node;
> > > > >                                 }
> > > > >                         }
> > > > >                 }
> > > >
> > > > Andy keeps trotting out a patch which will probably fix this,
> > >
> > > What patch do you mean? I don't have anything sysdata related
> > > left over.
> > >
> >
> > "pci device ensure sysdata initialised", now at version 4.
>
> Oh what a mess. I think I'll ask Linus to revert the sysdata patch
> instead. Clearly the stuff is half-baked
>
i noticed that you didn't sign off that patch ( from Muli). So Linus
picked it up directly?

YH

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-05  4:15         ` Muli Ben-Yehuda
@ 2007-08-05  4:33           ` Yinghai Lu
  2007-08-05  5:00             ` Muli Ben-Yehuda
  0 siblings, 1 reply; 35+ messages in thread
From: Yinghai Lu @ 2007-08-05  4:33 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: Andi Kleen, Andrew Morton, Chuck Ebbert, linux-kernel,
	riku.seppala, Andy Whitcroft, Jeff Garzik, Arjan van de Ven

On 8/4/07, Muli Ben-Yehuda <muli@il.ibm.com> wrote:
> On Sun, Aug 05, 2007 at 01:40:49AM +0200, Andi Kleen wrote:
>
> > > "pci device ensure sysdata initialised", now at version 4.
> >
> > Oh what a mess. I think I'll ask Linus to revert the sysdata patch
> > instead. Clearly the stuff is half-baked
>
> Personally I don't care that much, but I think you'll make jgarzik
> very unhappy. The idea is also clearly the right way forward.
>
> So far we know of a single regression and only with 'pci=noacpi'. How
> about giving us a little bit of time to get to the bottom of it?
> Unfortunately it's not reproducable on any of my machines, but I'm
> looking into it.
>

I hope we can use .node and .iommu in pci_bus...

YH

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-05  4:33           ` Yinghai Lu
@ 2007-08-05  5:00             ` Muli Ben-Yehuda
  0 siblings, 0 replies; 35+ messages in thread
From: Muli Ben-Yehuda @ 2007-08-05  5:00 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andi Kleen, Andrew Morton, Chuck Ebbert, linux-kernel,
	riku.seppala, Andy Whitcroft, Jeff Garzik, Arjan van de Ven

On Sat, Aug 04, 2007 at 09:33:58PM -0700, Yinghai Lu wrote:

> I hope we can use .node and .iommu in pci_bus...

pci_bus is shared between different architectures, and not all
architectures need .node and .iommu. Why is this better than using the
(arch specific) ->sysdata?

Cheers,
Muli

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-04 23:40       ` Andi Kleen
  2007-08-05  4:15         ` Muli Ben-Yehuda
  2007-08-05  4:31         ` Yinghai Lu
@ 2007-08-05  5:04         ` Muli Ben-Yehuda
  2007-08-05  5:38           ` Yinghai Lu
  2 siblings, 1 reply; 35+ messages in thread
From: Muli Ben-Yehuda @ 2007-08-05  5:04 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Chuck Ebbert, linux-kernel, riku.seppala, Andy Whitcroft

On Sun, Aug 05, 2007 at 01:40:49AM +0200, Andi Kleen wrote:

> Oh what a mess. I think I'll ask Linus to revert the sysdata patch
> instead. Clearly the stuff is half-baked

By the way, Andi, just making sure you're aware the issue Andy's patch
addresses is separate from the 2.6.23-rc1 ->sysdata breakage? 
Reverting my and jgarzik's sysdata patch won't fix Andy's issue.

Cheers,
Muli

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-05  5:04         ` Muli Ben-Yehuda
@ 2007-08-05  5:38           ` Yinghai Lu
  2007-08-05  7:53             ` [PATCH/RFT] finish i386 and x86-64 sysdata conversion Muli Ben-Yehuda
  0 siblings, 1 reply; 35+ messages in thread
From: Yinghai Lu @ 2007-08-05  5:38 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: Andi Kleen, Andrew Morton, Chuck Ebbert, linux-kernel,
	riku.seppala, Andy Whitcroft

On 8/4/07, Muli Ben-Yehuda <muli@il.ibm.com> wrote:
> On Sun, Aug 05, 2007 at 01:40:49AM +0200, Andi Kleen wrote:
>
> > Oh what a mess. I think I'll ask Linus to revert the sysdata patch
> > instead. Clearly the stuff is half-baked
>
> By the way, Andi, just making sure you're aware the issue Andy's patch
> addresses is separate from the 2.6.23-rc1 ->sysdata breakage?
> Reverting my and jgarzik's sysdata patch won't fix Andy's issue.
>


OK at this time, you can take my patch
x86_64-get-mp_bus_to_node-as-early-v2.patch
to fix the chuck the problem.

Chuck,
can you try my patch in -mm.
or you can get that from following link
http://lkml.org/lkml/2007/7/26/377

I like to see acpi=off work without problem on AMD64 platform.

YH

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-04 19:02           ` Yinghai Lu
@ 2007-08-05  5:52             ` Andrew Morton
  2007-08-05  6:02               ` Muli Ben-Yehuda
  2007-08-05  6:04               ` Yinghai Lu
  0 siblings, 2 replies; 35+ messages in thread
From: Andrew Morton @ 2007-08-05  5:52 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andi Kleen, Chuck Ebbert, Muli Ben-Yehuda, linux-kernel,
	riku.seppala, Andy Whitcroft

On Sat, 4 Aug 2007 12:02:35 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:

> On 8/4/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Sat, 4 Aug 2007 10:45:31 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
> >
> > > Andrew,
> > >
> > > still need
> x86_64-get-mp_bus_to_node-as-early-v2.patch is already in the -mm
> 
> it fixs
> pirq_peer_trick(void) in irq.c
> pcibios_fixup_peer_bridges in legacy.c
> 
> by allocate sd.
> 

I don't understand you much, sorry.  Are you saying that
x86_64-get-mp_bus_to_node-as-early-v2.patch should be merged in 2.6.23?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-05  5:52             ` Andrew Morton
@ 2007-08-05  6:02               ` Muli Ben-Yehuda
  2007-08-05  6:07                 ` Yinghai Lu
  2007-08-05  6:04               ` Yinghai Lu
  1 sibling, 1 reply; 35+ messages in thread
From: Muli Ben-Yehuda @ 2007-08-05  6:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yinghai Lu, Andi Kleen, Chuck Ebbert, linux-kernel, riku.seppala,
	Andy Whitcroft

On Sat, Aug 04, 2007 at 10:52:22PM -0700, Andrew Morton wrote:
> On Sat, 4 Aug 2007 12:02:35 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
> 
> > On 8/4/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> > > On Sat, 4 Aug 2007 10:45:31 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
> > >
> > > > Andrew,
> > > >
> > > > still need
> > x86_64-get-mp_bus_to_node-as-early-v2.patch is already in the -mm
> > 
> > it fixs
> > pirq_peer_trick(void) in irq.c
> > pcibios_fixup_peer_bridges in legacy.c
> > 
> > by allocate sd.
> > 
> 
> I don't understand you much, sorry.  Are you saying that
> x86_64-get-mp_bus_to_node-as-early-v2.patch should be merged in
> 2.6.23?

If I may try to elaborate, yhlu is pointing out that the ->sysdata
conversion missed a couple of spots in i386 where we need to allocate
a pci_sysdata object as well. Unfortunately I don't think
x86_64-get-mp_bus_to_node-as-early-v2 is a suitable fix as it does a
whole bunch of other stuff too. I am now preparing a minimal patch
(based on x86_64-get-mp_bus_to_node-as-early-v2) that only takes care
of the missing ->sysdata bits. If that patch cures the original bug
and doesn't cause any more regressions, it's definitely 2.6.23
material.

Cheers,
Muli

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-05  5:52             ` Andrew Morton
  2007-08-05  6:02               ` Muli Ben-Yehuda
@ 2007-08-05  6:04               ` Yinghai Lu
  1 sibling, 0 replies; 35+ messages in thread
From: Yinghai Lu @ 2007-08-05  6:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andi Kleen, Chuck Ebbert, Muli Ben-Yehuda, linux-kernel,
	riku.seppala, Andy Whitcroft

On 8/4/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Sat, 4 Aug 2007 12:02:35 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
>
> > On 8/4/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> > > On Sat, 4 Aug 2007 10:45:31 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
> > >
> > > > Andrew,
> > > >
> > > > still need
> > x86_64-get-mp_bus_to_node-as-early-v2.patch is already in the -mm
> >
> > it fixs
> > pirq_peer_trick(void) in irq.c
> > pcibios_fixup_peer_bridges in legacy.c
> >
> > by allocate sd.
> >
>
> I don't understand you much, sorry.  Are you saying that
> x86_64-get-mp_bus_to_node-as-early-v2.patch should be merged in 2.6.23?

Yes.

YH

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-05  6:02               ` Muli Ben-Yehuda
@ 2007-08-05  6:07                 ` Yinghai Lu
  2007-08-05  6:11                   ` Muli Ben-Yehuda
  0 siblings, 1 reply; 35+ messages in thread
From: Yinghai Lu @ 2007-08-05  6:07 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: Andrew Morton, Andi Kleen, Chuck Ebbert, linux-kernel,
	riku.seppala, Andy Whitcroft

On 8/4/07, Muli Ben-Yehuda <muli@il.ibm.com> wrote:
> On Sat, Aug 04, 2007 at 10:52:22PM -0700, Andrew Morton wrote:
> > On Sat, 4 Aug 2007 12:02:35 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
> >
> > > On 8/4/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> > > > On Sat, 4 Aug 2007 10:45:31 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
> > > >
> > > > > Andrew,
> > > > >
> > > > > still need
> > > x86_64-get-mp_bus_to_node-as-early-v2.patch is already in the -mm
> > >
> > > it fixs
> > > pirq_peer_trick(void) in irq.c
> > > pcibios_fixup_peer_bridges in legacy.c
> > >
> > > by allocate sd.
> > >
> >
> > I don't understand you much, sorry.  Are you saying that
> > x86_64-get-mp_bus_to_node-as-early-v2.patch should be merged in
> > 2.6.23?
>
> If I may try to elaborate, yhlu is pointing out that the ->sysdata
> conversion missed a couple of spots in i386 where we need to allocate
> a pci_sysdata object as well. Unfortunately I don't think
> x86_64-get-mp_bus_to_node-as-early-v2 is a suitable fix as it does a
> whole bunch of other stuff too. I am now preparing a minimal patch
> (based on x86_64-get-mp_bus_to_node-as-early-v2) that only takes care
> of the missing ->sysdata bits. If that patch cures the original bug
> and doesn't cause any more regressions, it's definitely 2.6.23
> material.

You just use my patch plus Andy's patch together...

YH

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-05  6:07                 ` Yinghai Lu
@ 2007-08-05  6:11                   ` Muli Ben-Yehuda
  2007-08-05  6:24                     ` Yinghai Lu
  0 siblings, 1 reply; 35+ messages in thread
From: Muli Ben-Yehuda @ 2007-08-05  6:11 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andrew Morton, Andi Kleen, Chuck Ebbert, linux-kernel,
	riku.seppala, Andy Whitcroft

On Sat, Aug 04, 2007 at 11:07:04PM -0700, Yinghai Lu wrote:

> You just use my patch plus Andy's patch together...

Your patch does other things that in my opinion are not appropriate at
this stage for 2.6.23. I am aiming for the minimal change that will
fix the regression, and your patch should be considered for 2.6.24.

Cheers,
Muli

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-05  6:11                   ` Muli Ben-Yehuda
@ 2007-08-05  6:24                     ` Yinghai Lu
  2007-08-05  6:27                       ` Yinghai Lu
  0 siblings, 1 reply; 35+ messages in thread
From: Yinghai Lu @ 2007-08-05  6:24 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: Andrew Morton, Andi Kleen, Chuck Ebbert, linux-kernel,
	riku.seppala, Andy Whitcroft

On 8/4/07, Muli Ben-Yehuda <muli@il.ibm.com> wrote:
> On Sat, Aug 04, 2007 at 11:07:04PM -0700, Yinghai Lu wrote:
>
> > You just use my patch plus Andy's patch together...
>
> Your patch does other things that in my opinion are not appropriate at
> this stage for 2.6.23. I am aiming for the minimal change that will
> fix the regression, and your patch should be considered for 2.6.24.
>
if Andi picked my batch v1 version before your sysdata patch, then
there would be this regression

YH..

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
  2007-08-05  6:24                     ` Yinghai Lu
@ 2007-08-05  6:27                       ` Yinghai Lu
  0 siblings, 0 replies; 35+ messages in thread
From: Yinghai Lu @ 2007-08-05  6:27 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: Andrew Morton, Andi Kleen, Chuck Ebbert, linux-kernel,
	riku.seppala, Andy Whitcroft

On 8/4/07, Yinghai Lu <yhlu.kernel@gmail.com> wrote:
> On 8/4/07, Muli Ben-Yehuda <muli@il.ibm.com> wrote:
> > On Sat, Aug 04, 2007 at 11:07:04PM -0700, Yinghai Lu wrote:
> >
> > > You just use my patch plus Andy's patch together...
> >
> > Your patch does other things that in my opinion are not appropriate at
> > this stage for 2.6.23. I am aiming for the minimal change that will
> > fix the regression, and your patch should be considered for 2.6.24.
> >
> if Andi picked my batch v1 version before your sysdata patch, then
> there would be this regression
i mean:
if Andi picked up my batch v1 version before your sysdata patch, then
there would not be this regression
YH

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH/RFT] finish i386 and x86-64 sysdata conversion
  2007-08-05  5:38           ` Yinghai Lu
@ 2007-08-05  7:53             ` Muli Ben-Yehuda
  2007-08-05  8:49               ` Yinghai Lu
  2007-08-07 22:49               ` Andrew Morton
  0 siblings, 2 replies; 35+ messages in thread
From: Muli Ben-Yehuda @ 2007-08-05  7:53 UTC (permalink / raw)
  To: Yinghai Lu, Andi Kleen, Andrew Morton, riku.seppala, Andy Whitcroft
  Cc: Chuck Ebbert, linux-kernel, Jeff Garzik, Muli Ben-Yehuda

This patch finishes the i386 and x86-64 ->sysdata conversion and
hopefully also fixes Riku's and Andy's observed bugs. It is based on
Yinghai Lu's and Andy Whitcroft's patches (thanks!) with some changes:

- introduce pci_scan_bus_with_sysdata() and use it instead of
  pci_scan_bus() where appropriate. pci_scan_bus_with_sysdata() will
  allocate the sysdata structure and then call pci_scan_bus().
- always allocate pci_sysdata dynamically. The whole point of this
  sysdata work is to make it easy to do root-bus specific things
  (e.g., support PCI domains and IOMMU's). I dislike using a default
  struct pci_sysdata in some places and a dynamically allocated
  pci_sysdata elsewhere - the potential for someone indavertantly
  changing the default structure is too high.
- this patch only makes the minimal changes necessary, i.e., the NUMA node is
  always initialized to -1. Patches to do the right thing with regards
  to the NUMA node can build on top of this (either add a 'node'
  parameter to pci_scan_bus_with_sysdata() or just update the node
  when it becomes known).

The patch was compile tested with various configurations (e.g., NUMAQ,
VISWS) and run-time tested on i386 and x86-64. Unfortunately none of
my machines exhibited the bugs so caveat emptor.

Andy, could you please see if this fixes the NUMA issues you've seen?
Riku, does this fix "pci=noacpi" on your laptop?

Comments appreciated. If this looks ok it should go into 2.6.23.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chuck Ebbert <cebbert@redhat.com>,
Cc: riku.seppala@kymp.net
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Jeff Garzik <jeff@garzik.org>
---
 arch/i386/pci/common.c   |   23 +++++++++++++++++++++++
 arch/i386/pci/fixup.c    |    6 +++---
 arch/i386/pci/irq.c      |    5 +++--
 arch/i386/pci/legacy.c   |    2 +-
 arch/i386/pci/numa.c     |   15 +++++++++------
 arch/i386/pci/visws.c    |    4 ++--
 include/asm-i386/pci.h   |    3 +++
 include/asm-x86_64/pci.h |    2 ++
 8 files changed, 46 insertions(+), 14 deletions(-)

diff --git a/arch/i386/pci/common.c b/arch/i386/pci/common.c
index 85503de..ebc6f3c 100644
--- a/arch/i386/pci/common.c
+++ b/arch/i386/pci/common.c
@@ -455,3 +455,26 @@ void pcibios_disable_device (struct pci_dev *dev)
 	if (!dev->msi_enabled && pcibios_disable_irq)
 		pcibios_disable_irq(dev);
 }
+
+struct pci_bus *pci_scan_bus_with_sysdata(int busno)
+{
+	struct pci_bus *bus = NULL;
+	struct pci_sysdata *sd;
+
+	/*
+	 * Allocate per-root-bus (not per bus) arch-specific data.
+	 * TODO: leak; this memory is never freed.
+	 * It's arguable whether it's worth the trouble to care.
+	 */
+	sd = kzalloc(sizeof(*sd), GFP_KERNEL);
+	if (!sd) {
+		printk(KERN_ERR "PCI: OOM, skipping PCI bus %02x\n", busno);
+		return NULL;
+	}
+	sd->node = -1;
+	bus = pci_scan_bus(busno, &pci_root_ops, sd);
+	if (!bus)
+		kfree(sd);
+
+	return bus;
+}
diff --git a/arch/i386/pci/fixup.c b/arch/i386/pci/fixup.c
index e7306db..c82cbf4 100644
--- a/arch/i386/pci/fixup.c
+++ b/arch/i386/pci/fixup.c
@@ -25,9 +25,9 @@ static void __devinit pci_fixup_i450nx(struct pci_dev *d)
 		pci_read_config_byte(d, reg++, &subb);
 		DBG("i450NX PXB %d: %02x/%02x/%02x\n", pxb, busno, suba, subb);
 		if (busno)
-			pci_scan_bus(busno, &pci_root_ops, NULL);	/* Bus A */
+			pci_scan_bus_with_sysdata(busno);	/* Bus A */
 		if (suba < subb)
-			pci_scan_bus(suba+1, &pci_root_ops, NULL);	/* Bus B */
+			pci_scan_bus_with_sysdata(suba+1);	/* Bus B */
 	}
 	pcibios_last_bus = -1;
 }
@@ -42,7 +42,7 @@ static void __devinit pci_fixup_i450gx(struct pci_dev *d)
 	u8 busno;
 	pci_read_config_byte(d, 0x4a, &busno);
 	printk(KERN_INFO "PCI: i440KX/GX host bridge %s: secondary bus %02x\n", pci_name(d), busno);
-	pci_scan_bus(busno, &pci_root_ops, NULL);
+	pci_scan_bus_with_sysdata(busno);
 	pcibios_last_bus = -1;
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82454GX, pci_fixup_i450gx);
diff --git a/arch/i386/pci/irq.c b/arch/i386/pci/irq.c
index f2cb942..665db06 100644
--- a/arch/i386/pci/irq.c
+++ b/arch/i386/pci/irq.c
@@ -138,8 +138,9 @@ static void __init pirq_peer_trick(void)
 	for(i = 1; i < 256; i++) {
 		if (!busmap[i] || pci_find_bus(0, i))
 			continue;
-		if (pci_scan_bus(i, &pci_root_ops, NULL))
-			printk(KERN_INFO "PCI: Discovered primary peer bus %02x [IRQ]\n", i);
+		if (pci_scan_bus_with_sysdata(i))
+			printk(KERN_INFO "PCI: Discovered primary peer "
+			       "bus %02x [IRQ]\n", i);
 	}
 	pcibios_last_bus = -1;
 }
diff --git a/arch/i386/pci/legacy.c b/arch/i386/pci/legacy.c
index 149a958..5565d70 100644
--- a/arch/i386/pci/legacy.c
+++ b/arch/i386/pci/legacy.c
@@ -26,7 +26,7 @@ static void __devinit pcibios_fixup_peer_bridges(void)
 			    l != 0x0000 && l != 0xffff) {
 				DBG("Found device at %02x:%02x [%04x]\n", n, devfn, l);
 				printk(KERN_INFO "PCI: Discovered peer bus %02x\n", n);
-				pci_scan_bus(n, &pci_root_ops, NULL);
+				pci_scan_bus_with_sysdata(n);
 				break;
 			}
 		}
diff --git a/arch/i386/pci/numa.c b/arch/i386/pci/numa.c
index adbe17a..f5f165f 100644
--- a/arch/i386/pci/numa.c
+++ b/arch/i386/pci/numa.c
@@ -96,10 +96,14 @@ static void __devinit pci_fixup_i450nx(struct pci_dev *d)
 		pci_read_config_byte(d, reg++, &suba);
 		pci_read_config_byte(d, reg++, &subb);
 		DBG("i450NX PXB %d: %02x/%02x/%02x\n", pxb, busno, suba, subb);
-		if (busno)
-			pci_scan_bus(QUADLOCAL2BUS(quad,busno), &pci_root_ops, NULL);	/* Bus A */
-		if (suba < subb)
-			pci_scan_bus(QUADLOCAL2BUS(quad,suba+1), &pci_root_ops, NULL);	/* Bus B */
+		if (busno) {
+			/* Bus A */
+			pci_scan_bus_with_sysdata(QUADLOCAL2BUS(quad, busno));
+		}
+		if (suba < subb) {
+			/* Bus B */
+			pci_scan_bus_with_sysdata(QUADLOCAL2BUS(quad, suba+1));
+		}
 	}
 	pcibios_last_bus = -1;
 }
@@ -123,8 +127,7 @@ static int __init pci_numa_init(void)
 				continue;
 			printk("Scanning PCI bus %d for quad %d\n", 
 				QUADLOCAL2BUS(quad,0), quad);
-			pci_scan_bus(QUADLOCAL2BUS(quad,0), 
-				&pci_root_ops, NULL);
+			pci_scan_bus_with_sysdata(QUADLOCAL2BUS(quad, 0));
 		}
 	return 0;
 }
diff --git a/arch/i386/pci/visws.c b/arch/i386/pci/visws.c
index f1b486d..8ecb1c7 100644
--- a/arch/i386/pci/visws.c
+++ b/arch/i386/pci/visws.c
@@ -101,8 +101,8 @@ static int __init pcibios_init(void)
 		"bridge B (PIIX4) bus: %u\n", pci_bus1, pci_bus0);
 
 	raw_pci_ops = &pci_direct_conf1;
-	pci_scan_bus(pci_bus0, &pci_root_ops, NULL);
-	pci_scan_bus(pci_bus1, &pci_root_ops, NULL);
+	pci_scan_bus_with_sysdata(pci_bus0);
+	pci_scan_bus_with_sysdata(pci_bus1);
 	pci_fixup_irqs(visws_swizzle, visws_map_irq);
 	pcibios_resource_survey();
 	return 0;
diff --git a/include/asm-i386/pci.h b/include/asm-i386/pci.h
index d790343..4fcacc7 100644
--- a/include/asm-i386/pci.h
+++ b/include/asm-i386/pci.h
@@ -8,6 +8,9 @@ struct pci_sysdata {
 	int		node;		/* NUMA node */
 };
 
+/* scan a bus after allocating a pci_sysdata for it */
+extern struct pci_bus *pci_scan_bus_with_sysdata(int busno);
+
 #include <linux/mm.h>		/* for struct page */
 
 /* Can be used to override the logic in pci_scan_bus for skipping
diff --git a/include/asm-x86_64/pci.h b/include/asm-x86_64/pci.h
index 88926eb..5da8cb0 100644
--- a/include/asm-x86_64/pci.h
+++ b/include/asm-x86_64/pci.h
@@ -10,6 +10,8 @@ struct pci_sysdata {
 	void*		iommu;		/* IOMMU private data */
 };
 
+extern struct pci_bus *pci_scan_bus_with_sysdata(int busno);
+
 #ifdef CONFIG_CALGARY_IOMMU
 static inline void* pci_iommu(struct pci_bus *bus)
 {
-- 
1.5.2




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFT] finish i386 and x86-64 sysdata conversion
  2007-08-05  7:53             ` [PATCH/RFT] finish i386 and x86-64 sysdata conversion Muli Ben-Yehuda
@ 2007-08-05  8:49               ` Yinghai Lu
  2007-08-05 11:54                 ` Muli Ben-Yehuda
  2007-08-07 22:49               ` Andrew Morton
  1 sibling, 1 reply; 35+ messages in thread
From: Yinghai Lu @ 2007-08-05  8:49 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: Andi Kleen, Andrew Morton, riku.seppala, Andy Whitcroft,
	Chuck Ebbert, linux-kernel, Jeff Garzik

On 8/5/07, Muli Ben-Yehuda <muli@il.ibm.com> wrote:
> This patch finishes the i386 and x86-64 ->sysdata conversion and
> hopefully also fixes Riku's and Andy's observed bugs. It is based on
> Yinghai Lu's and Andy Whitcroft's patches (thanks!) with some changes:
>
> - introduce pci_scan_bus_with_sysdata() and use it instead of
>   pci_scan_bus() where appropriate. pci_scan_bus_with_sysdata() will
>   allocate the sysdata structure and then call pci_scan_bus().
> - always allocate pci_sysdata dynamically. The whole point of this
>   sysdata work is to make it easy to do root-bus specific things
>   (e.g., support PCI domains and IOMMU's). I dislike using a default
>   struct pci_sysdata in some places and a dynamically allocated
>   pci_sysdata elsewhere - the potential for someone indavertantly
>   changing the default structure is too high.
> - this patch only makes the minimal changes necessary, i.e., the NUMA node is
>   always initialized to -1. Patches to do the right thing with regards
>   to the NUMA node can build on top of this (either add a 'node'
>   parameter to pci_scan_bus_with_sysdata() or just update the node
>   when it becomes known).
>
> The patch was compile tested with various configurations (e.g., NUMAQ,
> VISWS) and run-time tested on i386 and x86-64. Unfortunately none of
> my machines exhibited the bugs so caveat emptor.
>
> Andy, could you please see if this fixes the NUMA issues you've seen?
> Riku, does this fix "pci=noacpi" on your laptop?
>
> Comments appreciated. If this looks ok it should go into 2.6.23.
>
> Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
> Cc: Yinghai Lu <yhlu.kernel@gmail.com>
> Cc: Andi Kleen <ak@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Chuck Ebbert <cebbert@redhat.com>,
> Cc: riku.seppala@kymp.net
> Cc: Andy Whitcroft <apw@shadowen.org>
> Cc: Jeff Garzik <jeff@garzik.org>
> ---

sounds good.

Can you change
pci_scan_bus_with_sysdata(int busno)
to
pci_scan_bus_on_node(int bus, struct pci_ops *ops, int node)?

pci_scan_bus_with_sysdata(int busno) make me feel that i need feed one
sysdata as on param for it.

I only need to update irq.c and legacy.c in my patch.

YH

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFT] finish i386 and x86-64 sysdata conversion
  2007-08-05  8:49               ` Yinghai Lu
@ 2007-08-05 11:54                 ` Muli Ben-Yehuda
  2007-08-05 16:39                   ` Yinghai Lu
  0 siblings, 1 reply; 35+ messages in thread
From: Muli Ben-Yehuda @ 2007-08-05 11:54 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andi Kleen, Andrew Morton, riku.seppala, Andy Whitcroft,
	Chuck Ebbert, linux-kernel, Jeff Garzik

On Sun, Aug 05, 2007 at 01:49:57AM -0700, Yinghai Lu wrote:

> Can you change
> pci_scan_bus_with_sysdata(int busno)
> to
> pci_scan_bus_on_node(int bus, struct pci_ops *ops, int node)?

Do you anticipate passing in a different pci_ops or node? 
In any case please remember I am aiming for the minimal "obviously
correc" change for 2.6.23...

> pci_scan_bus_with_sysdata(int busno) make me feel that i need feed one
> sysdata as on param for it.

Yeah, lousy name, but the best I came up with. Runner up was
'x86_pci_scan_bus', which is I think worse?

Cheers,
Muli

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFT] finish i386 and x86-64 sysdata conversion
  2007-08-05 11:54                 ` Muli Ben-Yehuda
@ 2007-08-05 16:39                   ` Yinghai Lu
  2007-08-05 17:36                     ` Jeff Garzik
  0 siblings, 1 reply; 35+ messages in thread
From: Yinghai Lu @ 2007-08-05 16:39 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: Andi Kleen, Andrew Morton, riku.seppala, Andy Whitcroft,
	Chuck Ebbert, linux-kernel, Jeff Garzik

On 8/5/07, Muli Ben-Yehuda <muli@il.ibm.com> wrote:
> On Sun, Aug 05, 2007 at 01:49:57AM -0700, Yinghai Lu wrote:
>
> > Can you change
> > pci_scan_bus_with_sysdata(int busno)
> > to
> > pci_scan_bus_on_node(int bus, struct pci_ops *ops, int node)?
>
> Do you anticipate passing in a different pci_ops or node?
> In any case please remember I am aiming for the minimal "obviously
> correc" change for 2.6.23...
>
> > pci_scan_bus_with_sysdata(int busno) make me feel that i need feed one
> > sysdata as on param for it.
>
> Yeah, lousy name, but the best I came up with. Runner up was
> 'x86_pci_scan_bus', which is I think worse?
>
or

pci_scan_bus_on_node(int bus, struct pci_ops *ops, int node)
x86_pci_scan_root_bus(int bus)
{
  pci_scan_bus_on_node(bus, &pci_root_ops, -1);
}

i need node as one param for my patch later in irq.c and legacy.c

YH

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFT] finish i386 and x86-64 sysdata conversion
  2007-08-05 16:39                   ` Yinghai Lu
@ 2007-08-05 17:36                     ` Jeff Garzik
  2007-08-05 20:41                       ` Yinghai Lu
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff Garzik @ 2007-08-05 17:36 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Muli Ben-Yehuda, Andi Kleen, Andrew Morton, riku.seppala,
	Andy Whitcroft, Chuck Ebbert, linux-kernel

Yinghai Lu wrote:
> pci_scan_bus_on_node(int bus, struct pci_ops *ops, int node)
> x86_pci_scan_root_bus(int bus)
> {
>   pci_scan_bus_on_node(bus, &pci_root_ops, -1);
> }
> 
> i need node as one param for my patch later in irq.c and legacy.c


It is a mistake to start coding NUMA details into pci scan functions.

Anywhere the current code does not set the NUMA node, set it to -1 or 
some other default value.

	Jeff



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFT] finish i386 and x86-64 sysdata conversion
  2007-08-05 17:36                     ` Jeff Garzik
@ 2007-08-05 20:41                       ` Yinghai Lu
  0 siblings, 0 replies; 35+ messages in thread
From: Yinghai Lu @ 2007-08-05 20:41 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Muli Ben-Yehuda, Andi Kleen, Andrew Morton, riku.seppala,
	Andy Whitcroft, Chuck Ebbert, linux-kernel

On 8/5/07, Jeff Garzik <jeff@garzik.org> wrote:
> Yinghai Lu wrote:
> > pci_scan_bus_on_node(int bus, struct pci_ops *ops, int node)
> > x86_pci_scan_root_bus(int bus)
> > {
> >   pci_scan_bus_on_node(bus, &pci_root_ops, -1);
> > }
> >
> > i need node as one param for my patch later in irq.c and legacy.c
>
>
> It is a mistake to start coding NUMA details into pci scan functions.
>
> Anywhere the current code does not set the NUMA node, set it to -1 or
> some other default value.

Can you check
http://lkml.org/lkml/2007/7/26/377
http://lkml.org/lkml/2007/7/26/378
http://lkml.org/lkml/2007/7/26/379

it will make sure numa_node on device get correct value after pci scan.
esp for k8 system with second peer root bus on second node.

Thanks

Yinghai Lu

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFT] finish i386 and x86-64 sysdata conversion
  2007-08-05  7:53             ` [PATCH/RFT] finish i386 and x86-64 sysdata conversion Muli Ben-Yehuda
  2007-08-05  8:49               ` Yinghai Lu
@ 2007-08-07 22:49               ` Andrew Morton
  2007-08-07 22:56                 ` Muli Ben-Yehuda
  1 sibling, 1 reply; 35+ messages in thread
From: Andrew Morton @ 2007-08-07 22:49 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: Yinghai Lu, Andi Kleen, riku.seppala, Andy Whitcroft,
	Chuck Ebbert, linux-kernel, Jeff Garzik

On Sun, 5 Aug 2007 10:53:07 +0300
Muli Ben-Yehuda <muli@il.ibm.com> wrote:

> This patch finishes the i386 and x86-64 ->sysdata conversion and
> hopefully also fixes Riku's and Andy's observed bugs. It is based on
> Yinghai Lu's and Andy Whitcroft's patches (thanks!) with some changes:
> 
> - introduce pci_scan_bus_with_sysdata() and use it instead of
>   pci_scan_bus() where appropriate. pci_scan_bus_with_sysdata() will
>   allocate the sysdata structure and then call pci_scan_bus().
> - always allocate pci_sysdata dynamically. The whole point of this
>   sysdata work is to make it easy to do root-bus specific things
>   (e.g., support PCI domains and IOMMU's). I dislike using a default
>   struct pci_sysdata in some places and a dynamically allocated
>   pci_sysdata elsewhere - the potential for someone indavertantly
>   changing the default structure is too high.
> - this patch only makes the minimal changes necessary, i.e., the NUMA node is
>   always initialized to -1. Patches to do the right thing with regards
>   to the NUMA node can build on top of this (either add a 'node'
>   parameter to pci_scan_bus_with_sysdata() or just update the node
>   when it becomes known).
> 
> The patch was compile tested with various configurations (e.g., NUMAQ,
> VISWS) and run-time tested on i386 and x86-64. Unfortunately none of
> my machines exhibited the bugs so caveat emptor.
> 
> Andy, could you please see if this fixes the NUMA issues you've seen?
> Riku, does this fix "pci=noacpi" on your laptop?

I am sooooooo tired of this thing.  Andi, someone, can we for heaven's
sake please just get it all sorted out?

I dropped two of Yinghai's patches which conflicted with this, and then
two more which looked like they depended on the dropped ones.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFT] finish i386 and x86-64 sysdata conversion
  2007-08-07 22:49               ` Andrew Morton
@ 2007-08-07 22:56                 ` Muli Ben-Yehuda
  2007-08-08  0:43                   ` Jeff Garzik
  0 siblings, 1 reply; 35+ messages in thread
From: Muli Ben-Yehuda @ 2007-08-07 22:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yinghai Lu, Andi Kleen, riku.seppala, Andy Whitcroft,
	Chuck Ebbert, linux-kernel, Jeff Garzik

On Tue, Aug 07, 2007 at 03:49:11PM -0700, Andrew Morton wrote:

> I am sooooooo tired of this thing.  Andi, someone, can we for heaven's
> sake please just get it all sorted out?

With regards to the sysdata conversion: Riku says he cannot test new
kernel. I haven't heard anything from Andy Whitcroft. It passes all of
my tests and what we have now is obviously broken... I think we should
put the fix in.

Cheers,
Muli

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFT] finish i386 and x86-64 sysdata conversion
  2007-08-07 22:56                 ` Muli Ben-Yehuda
@ 2007-08-08  0:43                   ` Jeff Garzik
  2007-08-08  1:09                     ` Yinghai Lu
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff Garzik @ 2007-08-08  0:43 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: Andrew Morton, Yinghai Lu, Andi Kleen, riku.seppala,
	Andy Whitcroft, Chuck Ebbert, linux-kernel

Muli Ben-Yehuda wrote:
> On Tue, Aug 07, 2007 at 03:49:11PM -0700, Andrew Morton wrote:
> 
>> I am sooooooo tired of this thing.  Andi, someone, can we for heaven's
>> sake please just get it all sorted out?
> 
> With regards to the sysdata conversion: Riku says he cannot test new
> kernel. I haven't heard anything from Andy Whitcroft. It passes all of
> my tests and what we have now is obviously broken... I think we should
> put the fix in.

Strongly agreed.  It overall fixes bugs that existing _before_ the 
sysdata stuff went in.

It was clear the NUMA node was /not initialized/ in all cases, and I 
think it's quite unfair to revert the sysdata stuff because of 
pre-existing bugs (that we are fixing right now anyway).

	Jeff




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFT] finish i386 and x86-64 sysdata conversion
  2007-08-08  0:43                   ` Jeff Garzik
@ 2007-08-08  1:09                     ` Yinghai Lu
  2007-08-08  1:21                       ` Jeff Garzik
  0 siblings, 1 reply; 35+ messages in thread
From: Yinghai Lu @ 2007-08-08  1:09 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Muli Ben-Yehuda, Andrew Morton, Andi Kleen, riku.seppala,
	Andy Whitcroft, Chuck Ebbert, linux-kernel

On 8/7/07, Jeff Garzik <jeff@garzik.org> wrote:
> Muli Ben-Yehuda wrote:
>
> Strongly agreed.  It overall fixes bugs that existing _before_ the
> sysdata stuff went in.

what are those bugs?

the sysdata stuff try to fix:
when calgary iommu code was introduced, it is trying to share sysdata
that orginally is initialized k8_bus.c and used by pcibus_to_node. but
that only happen with AMD K8 platform. and have nothing to do with
calgary.
Then my patch get_mp_bus_to_node_as_early will use that sysdata as
node as early as possible before the pci_scan bus is called.

then Mul came out his patch ...to split the sharing that will never happen.

But andi picked muli patch even my patch stay a while in -mm.

Also i don't think anyone test muli sysdata patch with amd64 platform
with acpi=off. even muli himself.

calgary support bus numa at this time? i mean mult peer root bus on
different node..

anyway I will produce my patch in v3...

YH

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFT] finish i386 and x86-64 sysdata conversion
  2007-08-08  1:09                     ` Yinghai Lu
@ 2007-08-08  1:21                       ` Jeff Garzik
  2007-08-08  1:28                         ` Yinghai Lu
  2007-08-08  2:59                         ` Yinghai Lu
  0 siblings, 2 replies; 35+ messages in thread
From: Jeff Garzik @ 2007-08-08  1:21 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Muli Ben-Yehuda, Andrew Morton, Andi Kleen, riku.seppala,
	Andy Whitcroft, Chuck Ebbert, linux-kernel

Yinghai Lu wrote:
> the sysdata stuff try to fix:
> when calgary iommu code was introduced, it is trying to share sysdata
> that orginally is initialized k8_bus.c and used by pcibus_to_node. but
> that only happen with AMD K8 platform. and have nothing to do with
> calgary.

incorrect:  sysdata is initialized when the structure is created, which 
clearly not only in k8_bus.c


> Then my patch get_mp_bus_to_node_as_early will use that sysdata as
> node as early as possible before the pci_scan bus is called.

that's incorrect, since it assumes NUMA owns ->sysdata


> then Mul came out his patch ...to split the sharing that will never happen.

that's incorrect, ->sysdata will be shared among Calgary, NUMA, and 
PCI-domain support.


> Also i don't think anyone test muli sysdata patch with amd64 platform
> with acpi=off. even muli himself.

I did.  I did all PCI domain support development on AMD64 platform.


> calgary support bus numa at this time? i mean mult peer root bus on
> different node..

My PCI domains patch add support for multiple peer root buses.  That's 
the definition of PCI domain support.

Branch 'pciseg' of
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6.git

	Jeff



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFT] finish i386 and x86-64 sysdata conversion
  2007-08-08  1:21                       ` Jeff Garzik
@ 2007-08-08  1:28                         ` Yinghai Lu
  2007-08-08  2:59                         ` Yinghai Lu
  1 sibling, 0 replies; 35+ messages in thread
From: Yinghai Lu @ 2007-08-08  1:28 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Muli Ben-Yehuda, Andrew Morton, Andi Kleen, riku.seppala,
	Andy Whitcroft, Chuck Ebbert, linux-kernel

On 8/7/07, Jeff Garzik <jeff@garzik.org> wrote:
> Yinghai Lu wrote:
> > the sysdata stuff try to fix:
> > when calgary iommu code was introduced, it is trying to share sysdata
> > that orginally is initialized k8_bus.c and used by pcibus_to_node. but
> > that only happen with AMD K8 platform. and have nothing to do with
> > calgary.
>
> incorrect:  sysdata is initialized when the structure is created, which
> clearly not only in k8_bus.c

before his first patch, numa juse use sysdata pointer for node.

>
>
> > Then my patch get_mp_bus_to_node_as_early will use that sysdata as
> > node as early as possible before the pci_scan bus is called.
>
> that's incorrect, since it assumes NUMA owns ->sysdata

At that time, only it use that.

>
>
> > then Mul came out his patch ...to split the sharing that will never happen.
>
> that's incorrect, ->sysdata will be shared among Calgary, NUMA, and
> PCI-domain support.

but not the same time on run time system with Calgary, NUMA.

>
>
> > Also i don't think anyone test muli sysdata patch with amd64 platform
> > with acpi=off. even muli himself.
>
> I did.  I did all PCI domain support development on AMD64 platform.

really, but his first patch is broken with acpi=off.
I guess you tested his patch plus your patch. instead of test his
patch barely.  I tested his patch + my patch, and it works well, and
even test his patch only.

>
>
> > calgary support bus numa at this time? i mean mult peer root bus on
> > different node..
>
> My PCI domains patch add support for multiple peer root buses.  That's
> the definition of PCI domain support.
>
> Branch 'pciseg' of
> git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6.git

good, i will look at it.

YH

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFT] finish i386 and x86-64 sysdata conversion
  2007-08-08  1:21                       ` Jeff Garzik
  2007-08-08  1:28                         ` Yinghai Lu
@ 2007-08-08  2:59                         ` Yinghai Lu
  1 sibling, 0 replies; 35+ messages in thread
From: Yinghai Lu @ 2007-08-08  2:59 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Muli Ben-Yehuda, Andrew Morton, Andi Kleen, riku.seppala,
	Andy Whitcroft, Chuck Ebbert, linux-kernel

On 8/7/07, Jeff Garzik <jeff@garzik.org> wrote:
>
> My PCI domains patch add support for multiple peer root buses.  That's
> the definition of PCI domain support.

one one segment, we can not have multi peer root buses?

>
> Branch 'pciseg' of
> git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6.git

is your test system based AMD opteron..., and have io device on other
node/link, and use mmconfig to access that root bus on different
segment?

I used to think that we need bigSMP to support pci segments on AMD platfrom.

YH

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2007-08-08  2:59 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-03 22:10 Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask() Chuck Ebbert
2007-08-03 22:50 ` Andrew Morton
2007-08-04  6:17   ` Muli Ben-Yehuda
2007-08-04  9:30   ` Andi Kleen
2007-08-04 16:32     ` Andrew Morton
2007-08-04 17:45       ` Yinghai Lu
2007-08-04 18:15         ` Andrew Morton
2007-08-04 19:02           ` Yinghai Lu
2007-08-05  5:52             ` Andrew Morton
2007-08-05  6:02               ` Muli Ben-Yehuda
2007-08-05  6:07                 ` Yinghai Lu
2007-08-05  6:11                   ` Muli Ben-Yehuda
2007-08-05  6:24                     ` Yinghai Lu
2007-08-05  6:27                       ` Yinghai Lu
2007-08-05  6:04               ` Yinghai Lu
2007-08-04 23:40       ` Andi Kleen
2007-08-05  4:15         ` Muli Ben-Yehuda
2007-08-05  4:33           ` Yinghai Lu
2007-08-05  5:00             ` Muli Ben-Yehuda
2007-08-05  4:31         ` Yinghai Lu
2007-08-05  5:04         ` Muli Ben-Yehuda
2007-08-05  5:38           ` Yinghai Lu
2007-08-05  7:53             ` [PATCH/RFT] finish i386 and x86-64 sysdata conversion Muli Ben-Yehuda
2007-08-05  8:49               ` Yinghai Lu
2007-08-05 11:54                 ` Muli Ben-Yehuda
2007-08-05 16:39                   ` Yinghai Lu
2007-08-05 17:36                     ` Jeff Garzik
2007-08-05 20:41                       ` Yinghai Lu
2007-08-07 22:49               ` Andrew Morton
2007-08-07 22:56                 ` Muli Ben-Yehuda
2007-08-08  0:43                   ` Jeff Garzik
2007-08-08  1:09                     ` Yinghai Lu
2007-08-08  1:21                       ` Jeff Garzik
2007-08-08  1:28                         ` Yinghai Lu
2007-08-08  2:59                         ` Yinghai Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).