LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Andi Kleen <ak@suse.de>
Cc: Chuck Ebbert <cebbert@redhat.com>,
	Muli Ben-Yehuda <muli@il.ibm.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	riku.seppala@kymp.net, Andy Whitcroft <apw@shadowen.org>
Subject: Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()
Date: Sat, 4 Aug 2007 09:32:22 -0700	[thread overview]
Message-ID: <20070804093222.f0d7f3c7.akpm@linux-foundation.org> (raw)
In-Reply-To: <200708041130.42038.ak@suse.de>

On Sat, 4 Aug 2007 11:30:41 +0200 Andi Kleen <ak@suse.de> wrote:

> On Saturday 04 August 2007 00:50, Andrew Morton wrote:
> > On Fri, 03 Aug 2007 18:10:03 -0400
> >
> > Chuck Ebbert <cebbert@redhat.com> wrote:
> > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=250859
> > >
> > > at line 74:
> > >
> > > muli@62829:
> > > muli@62829: 					sd = bus->sysdata;
> > > muli@62829: 					sd->node = node;   <=====
> > >
> > > bus->sysdata is NULL.
> > >
> > > Last changed by this hunk of
> > > "x86-64: introduce struct pci_sysdata to facilitate sharing of
> > > ->sysdata":
> 
> Hmm, will double check. Perhaps Muli's conversion was incomplete.

hm.

> > > @@ -67,7 +69,9 @@ fill_mp_bus_to_cpumask(void)
> > >  						continue;
> > >  					if (!node_online(node))
> > >  						node = 0;
> > > -					bus->sysdata = (void *)node;
> > > +
> > > +					sd = bus->sysdata;
> > > +					sd->node = node;
> > >  				}
> > >  			}
> > >  		}
> >
> > Andy keeps trotting out a patch which will probably fix this,
> 
> What patch do you mean? I don't have anything sysdata related
> left over.
> 

"pci device ensure sysdata initialised", now at version 4.



From: Andy Whitcroft <apw@shadowen.org>

We have been seeing panic's on NUMA systems in pci_call_probe() in
2.6.19-rc1-mm1 and later.  This is related to the changes introduced in the
commit below:

    [x86, PCI] Switch pci_bus::sysdata from NUMA node integer to a pointer
    0a247a58fc3e2ecfc17654301033e8b8d08df2a2

In this change the sysdata has changed from directly representing a value
(the node number in NUMA) to a pointer to a structure.  However, it seems
that we do not always initialise this sysdata before we probe the device.

Prior to the changes above the node was defaulted to 'NULL' allocating the
devices to node 0 unconditionally.  This patch adds a default sysdata entry
(pci_default_sysdata), this is then used where 'NULL' was used previously. 
pci_default_sysdata defaults the node to unknown (-1).  This is a more
accurate assignment, mirroring the value returned where no topology support
is provided and no locality information is available.

There are only two uses of this value in the affected architectures
(x86, x86_64) and generic code:

1) in x86_64, dma_alloc_pages() looks up the node in order to
   allocate node local memory.  Here if the node is invalid we
   will default to the first online node.  Behaviour here should
   be unchanged.
2) in generic, pci_call_probe() looks up the node in order to
   restrict execution of the probe on the card local node, to
   favor node local allocation.  Where this is unknown previously
   we would force execution (and thereby allocation) to node 0,
   this is arguably wrong and using -1 releases this restriction.

In an ideal world we should be supplying a sysdata for the
appropriate node where it is known.  Where it is not known defaulting
to -1 seems a better course, and would help us where node 0 is
short of memory.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/i386/pci/common.c   |    2 ++
 arch/i386/pci/fixup.c    |    8 +++++---
 arch/i386/pci/numa.c     |    8 +++++---
 arch/i386/pci/visws.c    |    4 ++--
 include/asm-i386/pci.h   |    1 +
 include/asm-x86_64/pci.h |    1 +
 6 files changed, 16 insertions(+), 8 deletions(-)

diff -puN arch/i386/pci/common.c~pci-device-ensure-sysdata-initialised-v4 arch/i386/pci/common.c
--- a/arch/i386/pci/common.c~pci-device-ensure-sysdata-initialised-v4
+++ a/arch/i386/pci/common.c
@@ -27,6 +27,8 @@ unsigned long pirq_table_addr;
 struct pci_bus *pci_root_bus;
 struct pci_raw_ops *raw_pci_ops;
 
+struct pci_sysdata pci_default_sysdata = { .node = -1 };
+
 static int pci_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
 {
 	return raw_pci_ops->read(0, bus->number, devfn, where, size, value);
diff -puN arch/i386/pci/fixup.c~pci-device-ensure-sysdata-initialised-v4 arch/i386/pci/fixup.c
--- a/arch/i386/pci/fixup.c~pci-device-ensure-sysdata-initialised-v4
+++ a/arch/i386/pci/fixup.c
@@ -25,9 +25,11 @@ static void __devinit pci_fixup_i450nx(s
 		pci_read_config_byte(d, reg++, &subb);
 		DBG("i450NX PXB %d: %02x/%02x/%02x\n", pxb, busno, suba, subb);
 		if (busno)
-			pci_scan_bus(busno, &pci_root_ops, NULL);	/* Bus A */
+			pci_scan_bus(busno, &pci_root_ops,
+					&pci_default_sysdata);	/* Bus A */
 		if (suba < subb)
-			pci_scan_bus(suba+1, &pci_root_ops, NULL);	/* Bus B */
+			pci_scan_bus(suba+1, &pci_root_ops,
+					&pci_default_sysdata);	/* Bus B */
 	}
 	pcibios_last_bus = -1;
 }
@@ -42,7 +44,7 @@ static void __devinit pci_fixup_i450gx(s
 	u8 busno;
 	pci_read_config_byte(d, 0x4a, &busno);
 	printk(KERN_INFO "PCI: i440KX/GX host bridge %s: secondary bus %02x\n", pci_name(d), busno);
-	pci_scan_bus(busno, &pci_root_ops, NULL);
+	pci_scan_bus(busno, &pci_root_ops, &pci_default_sysdata);
 	pcibios_last_bus = -1;
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82454GX, pci_fixup_i450gx);
diff -puN arch/i386/pci/numa.c~pci-device-ensure-sysdata-initialised-v4 arch/i386/pci/numa.c
--- a/arch/i386/pci/numa.c~pci-device-ensure-sysdata-initialised-v4
+++ a/arch/i386/pci/numa.c
@@ -97,9 +97,11 @@ static void __devinit pci_fixup_i450nx(s
 		pci_read_config_byte(d, reg++, &subb);
 		DBG("i450NX PXB %d: %02x/%02x/%02x\n", pxb, busno, suba, subb);
 		if (busno)
-			pci_scan_bus(QUADLOCAL2BUS(quad,busno), &pci_root_ops, NULL);	/* Bus A */
+			pci_scan_bus(QUADLOCAL2BUS(quad,busno), &pci_root_ops,
+					&pci_default_sysdata);	/* Bus A */
 		if (suba < subb)
-			pci_scan_bus(QUADLOCAL2BUS(quad,suba+1), &pci_root_ops, NULL);	/* Bus B */
+			pci_scan_bus(QUADLOCAL2BUS(quad,suba+1), &pci_root_ops,
+					&pci_default_sysdata);	/* Bus B */
 	}
 	pcibios_last_bus = -1;
 }
@@ -124,7 +126,7 @@ static int __init pci_numa_init(void)
 			printk("Scanning PCI bus %d for quad %d\n", 
 				QUADLOCAL2BUS(quad,0), quad);
 			pci_scan_bus(QUADLOCAL2BUS(quad,0), 
-				&pci_root_ops, NULL);
+				&pci_root_ops, &pci_default_sysdata);
 		}
 	return 0;
 }
diff -puN arch/i386/pci/visws.c~pci-device-ensure-sysdata-initialised-v4 arch/i386/pci/visws.c
--- a/arch/i386/pci/visws.c~pci-device-ensure-sysdata-initialised-v4
+++ a/arch/i386/pci/visws.c
@@ -101,8 +101,8 @@ static int __init pcibios_init(void)
 		"bridge B (PIIX4) bus: %u\n", pci_bus1, pci_bus0);
 
 	raw_pci_ops = &pci_direct_conf1;
-	pci_scan_bus(pci_bus0, &pci_root_ops, NULL);
-	pci_scan_bus(pci_bus1, &pci_root_ops, NULL);
+	pci_scan_bus(pci_bus0, &pci_root_ops, &pci_default_sysdata);
+	pci_scan_bus(pci_bus1, &pci_root_ops, &pci_default_sysdata);
 	pci_fixup_irqs(visws_swizzle, visws_map_irq);
 	pcibios_resource_survey();
 	return 0;
diff -puN include/asm-i386/pci.h~pci-device-ensure-sysdata-initialised-v4 include/asm-i386/pci.h
--- a/include/asm-i386/pci.h~pci-device-ensure-sysdata-initialised-v4
+++ a/include/asm-i386/pci.h
@@ -7,6 +7,7 @@
 struct pci_sysdata {
 	int		node;		/* NUMA node */
 };
+extern struct pci_sysdata pci_default_sysdata;
 
 #include <linux/mm.h>		/* for struct page */
 
diff -puN include/asm-x86_64/pci.h~pci-device-ensure-sysdata-initialised-v4 include/asm-x86_64/pci.h
--- a/include/asm-x86_64/pci.h~pci-device-ensure-sysdata-initialised-v4
+++ a/include/asm-x86_64/pci.h
@@ -9,6 +9,7 @@ struct pci_sysdata {
 	int		node;		/* NUMA node */
 	void*		iommu;		/* IOMMU private data */
 };
+extern struct pci_sysdata pci_default_sysdata;
 
 #ifdef CONFIG_CALGARY_IOMMU
 static inline void* pci_iommu(struct pci_bus *bus)
_


  reply	other threads:[~2007-08-04 16:33 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-03 22:10 Chuck Ebbert
2007-08-03 22:50 ` Andrew Morton
2007-08-04  6:17   ` Muli Ben-Yehuda
2007-08-04  9:30   ` Andi Kleen
2007-08-04 16:32     ` Andrew Morton [this message]
2007-08-04 17:45       ` Yinghai Lu
2007-08-04 18:15         ` Andrew Morton
2007-08-04 19:02           ` Yinghai Lu
2007-08-05  5:52             ` Andrew Morton
2007-08-05  6:02               ` Muli Ben-Yehuda
2007-08-05  6:07                 ` Yinghai Lu
2007-08-05  6:11                   ` Muli Ben-Yehuda
2007-08-05  6:24                     ` Yinghai Lu
2007-08-05  6:27                       ` Yinghai Lu
2007-08-05  6:04               ` Yinghai Lu
2007-08-04 23:40       ` Andi Kleen
2007-08-05  4:15         ` Muli Ben-Yehuda
2007-08-05  4:33           ` Yinghai Lu
2007-08-05  5:00             ` Muli Ben-Yehuda
2007-08-05  4:31         ` Yinghai Lu
2007-08-05  5:04         ` Muli Ben-Yehuda
2007-08-05  5:38           ` Yinghai Lu
2007-08-05  7:53             ` [PATCH/RFT] finish i386 and x86-64 sysdata conversion Muli Ben-Yehuda
2007-08-05  8:49               ` Yinghai Lu
2007-08-05 11:54                 ` Muli Ben-Yehuda
2007-08-05 16:39                   ` Yinghai Lu
2007-08-05 17:36                     ` Jeff Garzik
2007-08-05 20:41                       ` Yinghai Lu
2007-08-07 22:49               ` Andrew Morton
2007-08-07 22:56                 ` Muli Ben-Yehuda
2007-08-08  0:43                   ` Jeff Garzik
2007-08-08  1:09                     ` Yinghai Lu
2007-08-08  1:21                       ` Jeff Garzik
2007-08-08  1:28                         ` Yinghai Lu
2007-08-08  2:59                         ` Yinghai Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070804093222.f0d7f3c7.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=ak@suse.de \
    --cc=apw@shadowen.org \
    --cc=cebbert@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=muli@il.ibm.com \
    --cc=riku.seppala@kymp.net \
    --subject='Re: Oops in 2.6.23-rc1-git9, arch/x86_64/pci/k8-bus.c::fill_mp_bus_to_cpumask()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).