LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* i82875p_edac: BAR 0 collision
@ 2008-11-07 11:43 Jarkko Lavinen
  2008-11-11  8:01 ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Jarkko Lavinen @ 2008-11-07 11:43 UTC (permalink / raw)
  To: linux-kernel

When I try to load i82875p_edac module on ASUS P4C800 Deluxe in 2.6.26
or 2.6.27 it fails due to BAR 0 collision. On 2.6.25 the
i82875p_edac works just fine.

Should i82875p_setup_overfl_dev() do some additional work to fix the
missing resource of the hidden overflow device?

When I try load i82875p_edac module on 2.6.27 I get

  # modprobe i82875p_edac
  FATAL: Error inserting i82875p_edac
  (/lib/modules/2.6.27.4/kernel/drivers/edac/i82875p_edac.ko): No such device

And dmesg shows (from 2.6.27.4):

  EDAC MC: Ver: 2.1.0 Oct 30 2008
  EDAC DEBUG: edac_pci_dev_parity_clear()
  ...
  EDAC DEBUG: edac_pci_dev_parity_clear()
  EDAC DEBUG: edac_sysfs_setup_mc_kset()
  EDAC DEBUG: edac_sysfs_setup_mc_kset() Registered '.../edac/mc' kobject
  EDAC DEBUG: i82875p_init_one()
  EDAC i82875p: i82875p init one
  EDAC DEBUG: i82875p_probe1()
  PCI: 0000:00:06.0 reg 10 32bit mmio: [fecf0000, fecf0fff]
  pci 0000:00:06.0: device not available because of BAR 0 [0xfecf0000-0xfecf0fff] collisions
  EDAC i82875p: i82875p_setup_overfl_dev(): Failed to enable overflow device
  EDAC DEBUG: 875p init fail

On 2.6.25.19 loading i82875p_edac works just fine and dmesg shows:

  EDAC MC: Ver: 2.1.0 Nov  4 2008
  EDAC i82875p: i82875p init one
  EDAC MC0: Giving out device to 'i82875p_edac' 'i82875p': DEV 0000:00:00.0
  EDAC PCI0: Giving out device to module 'i82875p_edac' controller 'EDAC PCI controller': DEV '0000:00:00.0' (POLLED)

Cheers
Jarkko Lavinen

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: i82875p_edac: BAR 0 collision
  2008-11-07 11:43 i82875p_edac: BAR 0 collision Jarkko Lavinen
@ 2008-11-11  8:01 ` Andrew Morton
  2008-11-12 21:06   ` Jarkko Lavinen
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Andrew Morton @ 2008-11-11  8:01 UTC (permalink / raw)
  To: Jarkko Lavinen; +Cc: linux-kernel, Doug Thompson, Jesse Barnes

(cc's added)

On Fri, 7 Nov 2008 13:43:55 +0200 Jarkko Lavinen <jlavi@iki.fi> wrote:

> When I try to load i82875p_edac module on ASUS P4C800 Deluxe in 2.6.26
> or 2.6.27 it fails due to BAR 0 collision. On 2.6.25 the
> i82875p_edac works just fine.
> 
> Should i82875p_setup_overfl_dev() do some additional work to fix the
> missing resource of the hidden overflow device?
> 
> When I try load i82875p_edac module on 2.6.27 I get
> 
>   # modprobe i82875p_edac
>   FATAL: Error inserting i82875p_edac
>   (/lib/modules/2.6.27.4/kernel/drivers/edac/i82875p_edac.ko): No such device
> 
> And dmesg shows (from 2.6.27.4):
> 
>   EDAC MC: Ver: 2.1.0 Oct 30 2008
>   EDAC DEBUG: edac_pci_dev_parity_clear()
>   ...
>   EDAC DEBUG: edac_pci_dev_parity_clear()
>   EDAC DEBUG: edac_sysfs_setup_mc_kset()
>   EDAC DEBUG: edac_sysfs_setup_mc_kset() Registered '.../edac/mc' kobject
>   EDAC DEBUG: i82875p_init_one()
>   EDAC i82875p: i82875p init one
>   EDAC DEBUG: i82875p_probe1()
>   PCI: 0000:00:06.0 reg 10 32bit mmio: [fecf0000, fecf0fff]
>   pci 0000:00:06.0: device not available because of BAR 0 [0xfecf0000-0xfecf0fff] collisions
>   EDAC i82875p: i82875p_setup_overfl_dev(): Failed to enable overflow device
>   EDAC DEBUG: 875p init fail
> 
> On 2.6.25.19 loading i82875p_edac works just fine and dmesg shows:
> 
>   EDAC MC: Ver: 2.1.0 Nov  4 2008
>   EDAC i82875p: i82875p init one
>   EDAC MC0: Giving out device to 'i82875p_edac' 'i82875p': DEV 0000:00:00.0
>   EDAC PCI0: Giving out device to module 'i82875p_edac' controller 'EDAC PCI controller': DEV '0000:00:00.0' (POLLED)
> 

Might be an EDAC driver regression.  It might also be a consequence of
PCI address space management fiddlings, but I think most of the changes
there post-date 2.6.26?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: i82875p_edac: BAR 0 collision
  2008-11-11  8:01 ` Andrew Morton
@ 2008-11-12 21:06   ` Jarkko Lavinen
  2008-11-18 21:57   ` Jarkko Lavinen
  2008-11-23 20:44   ` Jarkko Lavinen
  2 siblings, 0 replies; 6+ messages in thread
From: Jarkko Lavinen @ 2008-11-12 21:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Doug Thompson, Jesse Barnes

On Tue, Nov 11, 2008 at 12:01:52AM -0800, Andrew Morton wrote:
> Might be an EDAC driver regression.  It might also be a consequence of
> PCI address space management fiddlings, but I think most of the changes
> there post-date 2.6.26?

I tried having pci_bus_assign_resources() before enabling the
overflow device in 2.6.27.  The driver gets now loaded, but then 
I get false memory error reports and removing the module segfaults.

Those false memory errors with i82875p edac driver are nothing
new have been reported many times over the past years. On most
kernels i82875p_edac is totally unusable because of these false
errors.

I ran "modprobe i82875p_edac; sleep 5; rmmod i82875p_edac" in
2.6.27.5.  i82875p_setup_overfl_dev() is modified to use
pci_bus_assign_resources() and to be more verbose:

[  235.248583] EDAC DEBUG: i82875p_init_one()
[  235.248592] EDAC i82875p: i82875p init one
[  235.248689] EDAC DEBUG: i82875p_probe1()
[  235.248700] i82875p_setup_overfl_dev pci_get_device: 00000000
[  235.248813] pci 0000:00:06.0: found [8086/257e] class 000880 header type 00
[  235.248827] PCI: 0000:00:06.0 reg 10 32bit mmio: [fecf0000, fecf0fff]
[  235.248875] pci 0000:00:06.0: calling pci_fixup_transparent_bridge+0x0/0x2b
[  235.249676] i82875p_setup_overfl_dev pci_scan_device: f6c4f000
[  235.250982] i82875p_setup_overfl_dev calling pci_bus_assign_resources for dev->bus before enabling the device
[  235.251178] pci 0000:00:06.0: BAR 0: got res [0x88101000-0x88101fff] bus [0x88101000-0x88101fff] flags 0x20200
[  235.251190] pci 0000:00:06.0: BAR 0: moved to bus [0x88101000-0x88101fff] flags 0x20200
[  235.251196] pci 0000:00:01.0: PCI bridge, secondary bus 0000:01
[  235.251290] pci 0000:00:01.0:   IO window: disabled
[  235.251386] pci 0000:00:01.0:   MEM window: 0xfd900000-0xfe9fffff
[  235.251483] pci 0000:00:01.0:   PREFETCH window: 0x000000f3f00000-0x000000f7efffff
[  235.251631] pci 0000:00:1e.0: PCI bridge, secondary bus 0000:02
[  235.251727] pci 0000:00:1e.0:   IO window: 0xd000-0xdfff
[  235.251823] pci 0000:00:1e.0:   MEM window: 0xfea00000-0xfeafffff
[  235.251921] pci 0000:00:1e.0:   PREFETCH window: 0x00000088000000-0x000000880fffff
[  235.252092] EDAC DEBUG: edac_mc_register_sysfs_main_kobj()
[  235.253209] EDAC DEBUG: edac_mc_register_sysfs_main_kobj() Registered '.../edac/mc0' kobject
[  235.253233] EDAC DEBUG: edac_mc_add_mc()
[  235.253456] EDAC DEBUG: edac_create_sysfs_mci_device() idx=0
[  235.253654] EDAC DEBUG: edac_mc_workq_setup()
[  235.253665] EDAC MC0: Giving out device to 'i82875p_edac' 'i82875p': DEV 0000:00:00.0
[  235.253810] EDAC DEBUG: edac_pci_alloc_ctl_info()
[  235.253815] EDAC DEBUG: edac_pci_add_device()
[  235.254214] EDAC DEBUG: add_edac_pci_to_global_list()
[  235.254223] EDAC DEBUG: find_edac_pci_by_dev()
[  235.254227] EDAC DEBUG: edac_pci_create_sysfs() idx=0
[  235.254231] EDAC DEBUG: edac_pci_main_kobj_setup()
[  235.254603] EDAC DEBUG: Registered '.../edac/pci' kobject
[  235.254611] EDAC DEBUG: edac_pci_create_instance_kobj()
[  235.254843] EDAC DEBUG: edac_pci_create_instance_kobj() Register instance 'pci0' kobject
[  235.254984] EDAC DEBUG: edac_pci_workq_setup()
[  235.254996] EDAC PCI0: Giving out device to module 'i82875p_edac' controller 'EDAC PCI controller': DEV '0000:00:00.0' (POLLED)
[  236.253020] EDAC DEBUG: MC0: i82875p_check()
[  236.253043] EDAC DEBUG: MC0: edac_mc_find_csrow_by_page(): 0x8f9
[  236.253051] EDAC MC0: UE page 0x8f9, offset 0x0, grain 4096, row 0, labels ":": i82875p UE
[  237.253020] EDAC DEBUG: MC0: i82875p_check()
[  237.253043] EDAC DEBUG: MC0: edac_mc_find_csrow_by_page(): 0x9ae
[  237.253050] EDAC MC0: UE page 0x9ae, offset 0x0, grain 4096, row 0, labels ":": i82875p UE
[  238.253018] EDAC DEBUG: MC0: i82875p_check()
[  238.253040] EDAC DEBUG: MC0: edac_mc_find_csrow_by_page(): 0x25fe
[  238.253047] EDAC MC0: UE page 0x25fe, offset 0x0, grain 4096, row 0, labels ":": i82875p UE
[  239.253017] EDAC DEBUG: MC0: i82875p_check()
[  239.253039] EDAC DEBUG: MC0: edac_mc_find_csrow_by_page(): 0x25fe
[  239.253046] EDAC MC0: UE page 0x25fe, offset 0x0, grain 4096, row 0, labels ":": i82875p UE
[  240.253018] EDAC DEBUG: MC0: i82875p_check()
[  240.253040] EDAC DEBUG: MC0: edac_mc_find_csrow_by_page(): 0x25fe
[  240.253047] EDAC MC0: UE page 0x25fe, offset 0x0, grain 4096, row 0, labels ":": i82875p UE
[  241.253025] BUG: unable to handle kernel paging request at f898c10d
[  241.253203] IP: [<f898c10d>]
[  241.253330] *pde = 37814067 *pte = 00000000 
[  241.253501] Oops: 0000 [#1] SMP 
[  241.253665] Modules linked in: edac_core skge [last unloaded: i82875p_edac]
[  241.253911] 
[  241.253997] Pid: 2721, comm: edac-poller Not tainted (2.6.27.5 #1)
[  241.254005] EIP: 0060:[<f898c10d>] EFLAGS: 00010282 CPU: 0
[  241.254005] EIP is at 0xf898c10d
[  241.254005] EAX: f708f000 EBX: f708f000 ECX: f708f0b4 EDX: f898c10d
[  241.254005] ESI: f708f0b0 EDI: f8998caf EBP: f6ca3f90 ESP: f6ca3f88
[  241.254005]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[  241.254005] Process edac-poller (pid: 2721, ti=f6ca2000 task=f7831da0 task.ti=f6ca2000)
[  241.254005] Stack: f8998d02 f7187b80 f6ca3fa8 c022ea8e f7187b84 f7187b80 f7187b84 f7187b8c 
[  241.254005]        f6ca3fd0 c022ebba 00000000 f7831da0 c0231887 f6ca3fbc f6ca3fbc f7187b80 
[  241.254005]        c022eb03 00000000 f6ca3fe0 c02315fe c02315c3 00000000 00000000 c02044ab 
[  241.254005] Call Trace:
[  241.254005]  [<f8998d02>] ? edac_mc_workq_function+0x53/0x7c [edac_core]
[  241.254005]  [<c022ea8e>] ? run_workqueue+0x71/0xe6
[  241.254005]  [<c022ebba>] ? worker_thread+0xb7/0xc3
[  241.254005]  [<c0231887>] ? autoremove_wake_function+0x0/0x33
[  241.254005]  [<c022eb03>] ? worker_thread+0x0/0xc3
[  241.254005]  [<c02315fe>] ? kthread+0x3b/0x61
[  241.254005]  [<c02315c3>] ? kthread+0x0/0x61
[  241.254005]  [<c02044ab>] ? kernel_thread_helper+0x7/0x10
[  241.254005]  =======================
[  241.254005] Code:  Bad EIP value.
[  241.254005] EIP: [<f898c10d>] 0xf898c10d SS:ESP 0068:f6ca3f88
[  241.254005] ---[ end trace 9169bd6e3112abca ]---

Cheers
Jarkko Lavinen

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: i82875p_edac: BAR 0 collision
  2008-11-11  8:01 ` Andrew Morton
  2008-11-12 21:06   ` Jarkko Lavinen
@ 2008-11-18 21:57   ` Jarkko Lavinen
  2008-11-23 20:44   ` Jarkko Lavinen
  2 siblings, 0 replies; 6+ messages in thread
From: Jarkko Lavinen @ 2008-11-18 21:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Doug Thompson, Jesse Barnes

On Tue, Nov 11, 2008 at 12:01:52AM -0800, Andrew Morton wrote:
> On Fri, 7 Nov 2008 13:43:55 +0200 Jarkko Lavinen <jlavi@iki.fi> wrote:
> > When I try load i82875p_edac module on 2.6.27 I get
> > 
> >   # modprobe i82875p_edac
> >   FATAL: Error inserting i82875p_edac
> >   (/lib/modules/2.6.27.4/kernel/drivers/edac/i82875p_edac.ko): No such device

> Might be an EDAC driver regression.  It might also be a consequence of
> PCI address space management fiddlings, but I think most of the changes
> there post-date 2.6.26?

I can get around the modprobe problem by adding the missing resource after 
the hidden overflow device is revealed. The diff below is against 2.6.27.

	--- a/drivers/edac/i82875p_edac.c
	+++ b/drivers/edac/i82875p_edac.c
	@@ -295,6 +295,7 @@ static int i82875p_setup_overfl_dev(struct pci_dev *pdev,
	                                "%s(): pci_bus_add_device() Failed\n",
	                                __func__);
	                }
	+               pci_bus_assign_resources(dev->bus);
	        }
	 
	        *ovrfl_pdev = dev;

The access violation when doing "rmmod i82875p_edac" occurs
because the module exit function i82875p_exit() runs
pci_unregister_driver() while the edac_mc_workq_function() is
scheduled to be run.  When the work queue runs, it accesses
something not available anymore.

	static void __exit i82875p_exit(void)
	{
		debugf3("%s()\n", __func__);

	+	mci_saved->op_state = OP_OFFLINE;

		pci_unregister_driver(&i82875p_driver);

		if (!i82875p_registered) {
			i82875p_remove_one(mci_pdev);
			pci_dev_put(mci_pdev);
		}
	}
		

I tried to stop the workqueue by saving mci pointer at the time
of its allocation and just set its state to OFFLINE just before
calling pci_unregister_driver.

This isn't the right way to remove the module and edac_dore has 
refcount 2 after i82875p_edac has been removed. The refcount 
should be 0.

Cheers
Jarkko Lavinen

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: i82875p_edac: BAR 0 collision
  2008-11-11  8:01 ` Andrew Morton
  2008-11-12 21:06   ` Jarkko Lavinen
  2008-11-18 21:57   ` Jarkko Lavinen
@ 2008-11-23 20:44   ` Jarkko Lavinen
  2008-12-16 19:39     ` Jesse Barnes
  2 siblings, 1 reply; 6+ messages in thread
From: Jarkko Lavinen @ 2008-11-23 20:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, bluesmoke-devel, Doug Thompson, Jesse Barnes

On Tue, Nov 11, 2008 at 12:01:52AM -0800, Andrew Morton wrote:
> Might be an EDAC driver regression.  It might also be a consequence of
> PCI address space management fiddlings, but I think most of the changes
> there post-date 2.6.26?

There were 3 issues:
 1, PCI resources of the hidden overflow device had to be added separately
 2. In nodule exit there is one more mci struct kobject put than
     correspondig gets.
 3. The edac_mc waitqueue must be stopped before the polled memory
    area disappears.

Th attached patch fixes these problems. Module loads and removes now
without problems, with pci, edac, slab and kobject debug options
enabled.

i82875p code looks so simlar to i82875p that is has likely the same
ref count issues as i82875p.

Cheers
Jarkko Lavinen

>From 3abc62242a219d1466b32b59bb88b4c1b0e86a65 Mon Sep 17 00:00:00 2001
From: Jarkko Lavinen <jlavi@iki.fi>
Date: Sun, 23 Nov 2008 22:18:39 +0200
Subject: [PATCH] i82875p_edac: Fix module init and exit

The PCI resources of the hidden overflow device are missing after the overflow
device is revealed and resources must be added separately.

When exiting both edac_remove_sysfs_mci_device() in edac_mc_del_mc()
and edac_mc_free() in i82875p_remove_one() decrement the mci ref count.
Use an additional kobject_get() to keep mci valid edac_mc_del_mc() till the
final edac_mc_free().

Also i82875p_remove_one() should be called before pci_unregister_driver()
to stop the polling before the checked memory area disappearr.

Signed-off-by: Jarkko Lavinen <jlavi@iki.fi>
---
 drivers/edac/i82875p_edac.c |   14 ++++++++------
 1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/edac/i82875p_edac.c b/drivers/edac/i82875p_edac.c
index e43bdc4..ebb037b 100644
--- a/drivers/edac/i82875p_edac.c
+++ b/drivers/edac/i82875p_edac.c
@@ -182,8 +182,6 @@ static struct pci_dev *mci_pdev;	/* init dev: in case that AGP code has
 					 * already registered driver
 					 */
 
-static int i82875p_registered = 1;
-
 static struct edac_pci_ctl_info *i82875p_pci;
 
 static void i82875p_get_error_info(struct mem_ctl_info *mci,
@@ -295,6 +293,7 @@ static int i82875p_setup_overfl_dev(struct pci_dev *pdev,
 				"%s(): pci_bus_add_device() Failed\n",
 				__func__);
 		}
+		pci_bus_assign_resources(dev->bus);
 	}
 
 	*ovrfl_pdev = dev;
@@ -409,6 +408,9 @@ static int i82875p_probe1(struct pci_dev *pdev, int dev_idx)
 		goto fail0;
 	}
 
+	/* Keeps mci available after edac_mc_del_mc() till edac_mc_free() */
+	kobject_get(&mci->edac_mci_kobj);
+
 	debugf3("%s(): init mci\n", __func__);
 	mci->dev = &pdev->dev;
 	mci->mtype_cap = MEM_FLAG_DDR;
@@ -451,6 +453,7 @@ static int i82875p_probe1(struct pci_dev *pdev, int dev_idx)
 	return 0;
 
 fail1:
+	kobject_put(&mci->edac_mci_kobj);
 	edac_mc_free(mci);
 
 fail0:
@@ -578,12 +581,11 @@ static void __exit i82875p_exit(void)
 {
 	debugf3("%s()\n", __func__);
 
+	i82875p_remove_one(mci_pdev);
+	pci_dev_put(mci_pdev);
+
 	pci_unregister_driver(&i82875p_driver);
 
-	if (!i82875p_registered) {
-		i82875p_remove_one(mci_pdev);
-		pci_dev_put(mci_pdev);
-	}
 }
 
 module_init(i82875p_init);
-- 
1.5.6.5


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: i82875p_edac: BAR 0 collision
  2008-11-23 20:44   ` Jarkko Lavinen
@ 2008-12-16 19:39     ` Jesse Barnes
  0 siblings, 0 replies; 6+ messages in thread
From: Jesse Barnes @ 2008-12-16 19:39 UTC (permalink / raw)
  To: Jarkko Lavinen
  Cc: Andrew Morton, linux-kernel, bluesmoke-devel, Doug Thompson

On Sunday, November 23, 2008 12:44 pm Jarkko Lavinen wrote:
> On Tue, Nov 11, 2008 at 12:01:52AM -0800, Andrew Morton wrote:
> > Might be an EDAC driver regression.  It might also be a consequence of
> > PCI address space management fiddlings, but I think most of the changes
> > there post-date 2.6.26?
>
> There were 3 issues:
>  1, PCI resources of the hidden overflow device had to be added separately
>  2. In nodule exit there is one more mci struct kobject put than
>      correspondig gets.
>  3. The edac_mc waitqueue must be stopped before the polled memory
>     area disappears.
>
> Th attached patch fixes these problems. Module loads and removes now
> without problems, with pci, edac, slab and kobject debug options
> enabled.
>
> i82875p code looks so simlar to i82875p that is has likely the same
> ref count issues as i82875p.

It's a little funky for a driver to be calling pci_bus_assign_resources; but 
if it really does act like a bus (sounds like this device does) it might be 
appropriate.  I don't have any problems with the patch, but it's really up to 
Doug.

-- 
Jesse Barnes, Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-12-16 19:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-07 11:43 i82875p_edac: BAR 0 collision Jarkko Lavinen
2008-11-11  8:01 ` Andrew Morton
2008-11-12 21:06   ` Jarkko Lavinen
2008-11-18 21:57   ` Jarkko Lavinen
2008-11-23 20:44   ` Jarkko Lavinen
2008-12-16 19:39     ` Jesse Barnes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).