LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [2.6.26] OOPS in __linkwatch_run_queue (unable to handle kernel NULL pointer dereference at 00000235) @ 2008-10-27 15:00 Folkert van Heusden 2008-10-30 9:09 ` Andrew Morton 0 siblings, 1 reply; 7+ messages in thread From: Folkert van Heusden @ 2008-10-27 15:00 UTC (permalink / raw) To: linux-kernel Hi, While running my http://vanheusden.com/pyk/ script (which randomly inserts and removes modules) I triggered the folllowing oops in a 2.6.26 kernel on a pre-ht pentium 4 (hp pavillion laptop type zv5231ea): [ 1037.480097] BUG: unable to handle kernel NULL pointer dereference at 00000235 [ 1037.480188] IP: [<c0261078>] __linkwatch_run_queue+0x6d/0x15a [ 1037.480256] *pde = 00000000 [ 1037.480315] Oops: 0000 [#1] SMP [ 1037.480393] Modules linked in: pcspkr(+) dm_snapshot joydev cdrom cpufreq_conservative snd_atiixp_modem(+) ohci_hcd(+) serio_raw wmi loop i2c_piix4 button ac sbp2 parport_pc lp psmouse output snd_seq_device pci_hotplug arc4 ecb crypto_blkcipher b43legacy input_polldev led_class rng_core rfkill_input rfkill cpufreq_stats ssb pcmcia firmware_class mac80211 parport cpufreq_ondemand dm_mirror i2c_core freq_table cfg80211 dm_log snd_ac97_codec ac97_bus ieee1394 scsi_mod snd_pcm snd_timer yenta_socket rsrc_nonstatic snd_page_alloc radeon drm rfcomm l2cap bluetooth ipv6 snd soundcore ati_agp agpgart evdev ext3 jbd mbcache dm_mod ide_disk 8139cp atiixp 8139too mii pcmcia_core ide_pci_generic ide_core usbcore processor thermal_sys [last unloaded: b44] [ 1037.482294] [ 1037.482328] Pid: 6, comm: events/0 Not tainted (2.6.26-1-686 #1) [ 1037.482365] EIP: 0060:[<c0261078>] EFLAGS: 00010246 CPU: 0 [ 1037.482403] EIP is at __linkwatch_run_queue+0x6d/0x15a [ 1037.482439] EAX: f744c000 EBX: 00000001 ECX: c03632a4 EDX: 01553000 [ 1037.482476] ESI: 00000003 EDI: 00000000 EBP: 00000001 ESP: f744df94 [ 1037.482536] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 1037.482572] Process events/0 (pid: 6, ti=f744c000 task=f743e460 task.ti=f744c000) [ 1037.482609] Stack: f7414600 c03632a0 c0261165 00000000 c0261182 c012edee f7414600 c012f4c9 [ 1037.482855] f741460c c012f57c 00000000 f743e460 c013177c f744dfc8 f744dfc8 f7414600 [ 1037.483101] 00000000 c01316bb c0131683 00000000 c01044f3 f7441ef8 00000000 00000000 [ 1037.483346] Call Trace: [ 1037.483412] [<c0261165>] linkwatch_event+0x0/0x22 [ 1037.483474] [<c0261182>] linkwatch_event+0x1d/0x22 [ 1037.483534] [<c012edee>] run_workqueue+0x74/0xf2 [ 1037.483596] [<c012f4c9>] worker_thread+0x0/0xbd [ 1037.483656] [<c012f57c>] worker_thread+0xb3/0xbd [ 1037.483718] [<c013177c>] autoremove_wake_function+0x0/0x2d [ 1037.483785] [<c01316bb>] kthread+0x38/0x5d [ 1037.483842] [<c0131683>] kthread+0x0/0x5d [ 1037.483901] [<c01044f3>] kernel_thread_helper+0x7/0x10 [ 1037.483974] ======================= [ 1037.484007] Code: 80 05 00 8b 1d 9c b5 41 c0 c7 05 9c b5 41 c0 00 00 00 00 90 fe 05 90 b5 41 c0 fb 0f 1f 84 00 00 00 00 00 90 e9 ce 00 00 00 85 ff <8b> ab 34 02 00 00 74 17 89 d8 e8 7d fe ff ff 85 c0 75 0c 89 d8 [ 1037.484013] EIP: [<c0261078>] __linkwatch_run_queue+0x6d/0x15a SS:ESP 0068:f744df94 [ 1037.485642] ---[ end trace 69e935fd5d0e15c9 ]--- Folkert van Heusden -- Multi tail barnamaj mowahib li mora9abat attasjilat wa nataij awamir al 7asoub. damj, talwin, mora9abat attarchi7 wa ila akhirih. http://www.vanheusden.com/multitail/ ---------------------------------------------------------------------- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [2.6.26] OOPS in __linkwatch_run_queue (unable to handle kernel NULL pointer dereference at 00000235) 2008-10-27 15:00 [2.6.26] OOPS in __linkwatch_run_queue (unable to handle kernel NULL pointer dereference at 00000235) Folkert van Heusden @ 2008-10-30 9:09 ` Andrew Morton 2008-11-17 8:40 ` [PATCH] " Jarek Poplawski 0 siblings, 1 reply; 7+ messages in thread From: Andrew Morton @ 2008-10-30 9:09 UTC (permalink / raw) To: Folkert van Heusden; +Cc: linux-kernel, netdev (cc netdev) On Mon, 27 Oct 2008 16:00:02 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote: > Hi, > > While running my http://vanheusden.com/pyk/ script (which randomly > inserts and removes modules) I triggered the folllowing oops in a 2.6.26 > kernel on a pre-ht pentium 4 (hp pavillion laptop type zv5231ea): > > [ 1037.480097] BUG: unable to handle kernel NULL pointer dereference at 00000235 > [ 1037.480188] IP: [<c0261078>] __linkwatch_run_queue+0x6d/0x15a > [ 1037.480256] *pde = 00000000 > [ 1037.480315] Oops: 0000 [#1] SMP > [ 1037.480393] Modules linked in: pcspkr(+) dm_snapshot joydev cdrom cpufreq_conservative snd_atiixp_modem(+) ohci_hcd(+) serio_raw wmi loop i2c_piix4 button ac sbp2 parport_pc lp psmouse output snd_seq_device pci_hotplug arc4 ecb crypto_blkcipher b43legacy input_polldev led_class rng_core rfkill_input rfkill cpufreq_stats ssb pcmcia firmware_class mac80211 parport cpufreq_ondemand dm_mirror i2c_core freq_table cfg80211 dm_log snd_ac97_codec ac97_bus ieee1394 scsi_mod snd_pcm snd_timer yenta_socket rsrc_nonstatic snd_page_alloc radeon drm rfcomm l2cap bluetooth ipv6 snd soundcore ati_agp agpgart evdev ext3 jbd mbcache dm_mod ide_disk 8139cp atiixp 8139too mii pcmcia_core ide_pci_generic ide_core usbcore processor thermal_sys [last unloaded: b44] > [ 1037.482294] > [ 1037.482328] Pid: 6, comm: events/0 Not tainted (2.6.26-1-686 #1) > [ 1037.482365] EIP: 0060:[<c0261078>] EFLAGS: 00010246 CPU: 0 > [ 1037.482403] EIP is at __linkwatch_run_queue+0x6d/0x15a > [ 1037.482439] EAX: f744c000 EBX: 00000001 ECX: c03632a4 EDX: 01553000 > [ 1037.482476] ESI: 00000003 EDI: 00000000 EBP: 00000001 ESP: f744df94 > [ 1037.482536] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > [ 1037.482572] Process events/0 (pid: 6, ti=f744c000 task=f743e460 task.ti=f744c000) > [ 1037.482609] Stack: f7414600 c03632a0 c0261165 00000000 c0261182 c012edee f7414600 c012f4c9 > [ 1037.482855] f741460c c012f57c 00000000 f743e460 c013177c f744dfc8 f744dfc8 f7414600 > [ 1037.483101] 00000000 c01316bb c0131683 00000000 c01044f3 f7441ef8 00000000 00000000 > [ 1037.483346] Call Trace: > [ 1037.483412] [<c0261165>] linkwatch_event+0x0/0x22 > [ 1037.483474] [<c0261182>] linkwatch_event+0x1d/0x22 > [ 1037.483534] [<c012edee>] run_workqueue+0x74/0xf2 > [ 1037.483596] [<c012f4c9>] worker_thread+0x0/0xbd > [ 1037.483656] [<c012f57c>] worker_thread+0xb3/0xbd > [ 1037.483718] [<c013177c>] autoremove_wake_function+0x0/0x2d > [ 1037.483785] [<c01316bb>] kthread+0x38/0x5d > [ 1037.483842] [<c0131683>] kthread+0x0/0x5d > [ 1037.483901] [<c01044f3>] kernel_thread_helper+0x7/0x10 > [ 1037.483974] ======================= > [ 1037.484007] Code: 80 05 00 8b 1d 9c b5 41 c0 c7 05 9c b5 41 c0 00 00 00 00 90 fe 05 90 b5 41 c0 fb 0f 1f 84 00 00 00 00 00 90 e9 ce 00 00 00 85 ff <8b> ab 34 02 00 00 74 17 89 d8 e8 7d fe ff ff 85 c0 75 0c 89 d8 > [ 1037.484013] EIP: [<c0261078>] __linkwatch_run_queue+0x6d/0x15a SS:ESP 0068:f744df94 > [ 1037.485642] ---[ end trace 69e935fd5d0e15c9 ]--- > > > Folkert van Heusden ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] Re: [2.6.26] OOPS in __linkwatch_run_queue (unable to handle kernel NULL pointer dereference at 00000235) 2008-10-30 9:09 ` Andrew Morton @ 2008-11-17 8:40 ` Jarek Poplawski 2008-11-17 8:50 ` David Miller 0 siblings, 1 reply; 7+ messages in thread From: Jarek Poplawski @ 2008-11-17 8:40 UTC (permalink / raw) To: David Miller; +Cc: Folkert van Heusden, Andrew Morton, linux-kernel, netdev On 30-10-2008 10:09, Andrew Morton wrote: > (cc netdev) > > On Mon, 27 Oct 2008 16:00:02 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote: > >> Hi, >> >> While running my http://vanheusden.com/pyk/ script (which randomly >> inserts and removes modules) I triggered the folllowing oops in a 2.6.26 >> kernel on a pre-ht pentium 4 (hp pavillion laptop type zv5231ea): >> >> [ 1037.480097] BUG: unable to handle kernel NULL pointer dereference at 00000235 >> [ 1037.480188] IP: [<c0261078>] __linkwatch_run_queue+0x6d/0x15a ... -------------------> net: link_watch: Don't add a linkwatch event before register_netdev() b44 and some other network drivers run netif_carrier_off() before register_netdev(). Then, if register fails, free_netdev() destruction is done while dev is still referenced and held on the lweventlist. Of course, it would be nice if all drivers could use some common order of calling things like register_netdev() vs. netif_carrier_off(), but since there is a lot of this I guess there is probably some reason, so this patch doesn't change the order but assumes that such an early netif_carrier_off() is only to set the __LINK_STATE_NOCARRIER flag, and some netif_carrier_on()/_off() will still follow. Reported-by: Folkert van Heusden <folkert@vanheusden.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> --- diff --git a/net/core/link_watch.c b/net/core/link_watch.c index bf8f7af..393c2ba 100644 --- a/net/core/link_watch.c +++ b/net/core/link_watch.c @@ -216,8 +216,13 @@ void linkwatch_fire_event(struct net_device *dev) bool urgent = linkwatch_urgent_event(dev); if (!test_and_set_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state)) { - dev_hold(dev); + /* don't add an event before register_netdev(); it can fail */ + if (!test_bit(__LINK_STATE_PRESENT, &dev->state)) { + WARN_ON(1); + return; + } + dev_hold(dev); linkwatch_add_event(dev); } else if (!urgent) return; diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 80c8f3d..8f99c06 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -286,7 +286,8 @@ EXPORT_SYMBOL(netif_carrier_on); void netif_carrier_off(struct net_device *dev) { if (!test_and_set_bit(__LINK_STATE_NOCARRIER, &dev->state)) - linkwatch_fire_event(dev); + if (test_bit(__LINK_STATE_PRESENT, &dev->state)) + linkwatch_fire_event(dev); } EXPORT_SYMBOL(netif_carrier_off); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Re: [2.6.26] OOPS in __linkwatch_run_queue (unable to handle kernel NULL pointer dereference at 00000235) 2008-11-17 8:40 ` [PATCH] " Jarek Poplawski @ 2008-11-17 8:50 ` David Miller 2008-11-17 9:11 ` Jarek Poplawski 0 siblings, 1 reply; 7+ messages in thread From: David Miller @ 2008-11-17 8:50 UTC (permalink / raw) To: jarkao2; +Cc: folkert, akpm, linux-kernel, netdev From: Jarek Poplawski <jarkao2@gmail.com> Date: Mon, 17 Nov 2008 08:40:58 +0000 > net: link_watch: Don't add a linkwatch event before register_netdev() > > b44 and some other network drivers run netif_carrier_off() before > register_netdev(). Then, if register fails, free_netdev() destruction > is done while dev is still referenced and held on the lweventlist. > > Of course, it would be nice if all drivers could use some common order > of calling things like register_netdev() vs. netif_carrier_off(), but > since there is a lot of this I guess there is probably some reason, > so this patch doesn't change the order but assumes that such an early > netif_carrier_off() is only to set the __LINK_STATE_NOCARRIER flag, > and some netif_carrier_on()/_off() will still follow. > > Reported-by: Folkert van Heusden <folkert@vanheusden.com> > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Ugh, drivers should not be fiddling with stuff like this before the device object is even registered. I can just imagine all sorts of other operations drivers might find it "convenient" to do before the sucker is even registered, causing similar if not even worse problems. It's pretty simple, make netif_carrier_off() be the first thing ->open() does and make it the last thing ->stop() and ->suspend() do. That's how to fix this bug. I'm going to fix this as follows. B44 already abided by 2/3 of this by handling the ->stop() and ->suspend() cases correctly already. b44: Do not call netif_carrier_off() before device is even registered. This is illegal. Instead simply do this first thing in ->open(). Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/drivers/net/b44.c b/drivers/net/b44.c index c3bda5c..2e353b8 100644 --- a/drivers/net/b44.c +++ b/drivers/net/b44.c @@ -1426,6 +1426,8 @@ static int b44_open(struct net_device *dev) struct b44 *bp = netdev_priv(dev); int err; + netif_carrier_off(dev); + err = b44_alloc_consistent(bp, GFP_KERNEL); if (err) goto out; @@ -2165,8 +2167,6 @@ static int __devinit b44_init_one(struct ssb_device *sdev, dev->irq = sdev->irq; SET_ETHTOOL_OPS(dev, &b44_ethtool_ops); - netif_carrier_off(dev); - err = ssb_bus_powerup(sdev->bus, 0); if (err) { dev_err(sdev->dev, ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Re: [2.6.26] OOPS in __linkwatch_run_queue (unable to handle kernel NULL pointer dereference at 00000235) 2008-11-17 8:50 ` David Miller @ 2008-11-17 9:11 ` Jarek Poplawski 2008-11-17 9:15 ` David Miller 0 siblings, 1 reply; 7+ messages in thread From: Jarek Poplawski @ 2008-11-17 9:11 UTC (permalink / raw) To: David Miller; +Cc: folkert, akpm, linux-kernel, netdev On Mon, Nov 17, 2008 at 12:50:25AM -0800, David Miller wrote: > From: Jarek Poplawski <jarkao2@gmail.com> > Date: Mon, 17 Nov 2008 08:40:58 +0000 > > > net: link_watch: Don't add a linkwatch event before register_netdev() > > > > b44 and some other network drivers run netif_carrier_off() before > > register_netdev(). Then, if register fails, free_netdev() destruction > > is done while dev is still referenced and held on the lweventlist. > > > > Of course, it would be nice if all drivers could use some common order > > of calling things like register_netdev() vs. netif_carrier_off(), but > > since there is a lot of this I guess there is probably some reason, > > so this patch doesn't change the order but assumes that such an early > > netif_carrier_off() is only to set the __LINK_STATE_NOCARRIER flag, > > and some netif_carrier_on()/_off() will still follow. > > > > Reported-by: Folkert van Heusden <folkert@vanheusden.com> > > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> > > Ugh, drivers should not be fiddling with stuff like this before the > device object is even registered. > > I can just imagine all sorts of other operations drivers might find > it "convenient" to do before the sucker is even registered, causing > similar if not even worse problems. > > It's pretty simple, make netif_carrier_off() be the first thing > ->open() does and make it the last thing ->stop() and ->suspend() > do. That's how to fix this bug. > > I'm going to fix this as follows. B44 already abided by 2/3 of > this by handling the ->stop() and ->suspend() cases correctly already. > > b44: Do not call netif_carrier_off() before device is even registered. > > This is illegal. Instead simply do this first thing in ->open(). Yes, this looks very nice! But this should be done in "a few" more drivers, like cxgb3 etc. Jarek P. PS: I think "Reported-by" could stay anyway. > Signed-off-by: David S. Miller <davem@davemloft.net> > > diff --git a/drivers/net/b44.c b/drivers/net/b44.c > index c3bda5c..2e353b8 100644 > --- a/drivers/net/b44.c > +++ b/drivers/net/b44.c > @@ -1426,6 +1426,8 @@ static int b44_open(struct net_device *dev) > struct b44 *bp = netdev_priv(dev); > int err; > > + netif_carrier_off(dev); > + > err = b44_alloc_consistent(bp, GFP_KERNEL); > if (err) > goto out; > @@ -2165,8 +2167,6 @@ static int __devinit b44_init_one(struct ssb_device *sdev, > dev->irq = sdev->irq; > SET_ETHTOOL_OPS(dev, &b44_ethtool_ops); > > - netif_carrier_off(dev); > - > err = ssb_bus_powerup(sdev->bus, 0); > if (err) { > dev_err(sdev->dev, ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Re: [2.6.26] OOPS in __linkwatch_run_queue (unable to handle kernel NULL pointer dereference at 00000235) 2008-11-17 9:11 ` Jarek Poplawski @ 2008-11-17 9:15 ` David Miller 2008-11-19 23:35 ` David Miller 0 siblings, 1 reply; 7+ messages in thread From: David Miller @ 2008-11-17 9:15 UTC (permalink / raw) To: jarkao2; +Cc: folkert, akpm, linux-kernel, netdev From: Jarek Poplawski <jarkao2@gmail.com> Date: Mon, 17 Nov 2008 09:11:08 +0000 > On Mon, Nov 17, 2008 at 12:50:25AM -0800, David Miller wrote: > Yes, this looks very nice! But this should be done in "a few" more > drivers, like cxgb3 etc. Thanks for pointing that out, I'll fix those cases too. > PS: I think "Reported-by" could stay anyway. Absolutely, I'll fix that. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Re: [2.6.26] OOPS in __linkwatch_run_queue (unable to handle kernel NULL pointer dereference at 00000235) 2008-11-17 9:15 ` David Miller @ 2008-11-19 23:35 ` David Miller 0 siblings, 0 replies; 7+ messages in thread From: David Miller @ 2008-11-19 23:35 UTC (permalink / raw) To: jarkao2; +Cc: folkert, akpm, linux-kernel, netdev From: David Miller <davem@davemloft.net> Date: Mon, 17 Nov 2008 01:15:09 -0800 (PST) > From: Jarek Poplawski <jarkao2@gmail.com> > Date: Mon, 17 Nov 2008 09:11:08 +0000 > > > On Mon, Nov 17, 2008 at 12:50:25AM -0800, David Miller wrote: > > Yes, this looks very nice! But this should be done in "a few" more > > drivers, like cxgb3 etc. > > Thanks for pointing that out, I'll fix those cases too. I did an audit, and found that this construct is too pervasive to fix right now. Even e1000 and e1000e do this call of netif_carrier_off() before the device is even registered. So here is the bandaid I'll use to fix the bug in 2.6.28 net: Do not fire linkwatch events until the device is registered. Several device drivers try to do things like netif_carrier_off() before register_netdev() is invoked. This is bogus, but too many drivers do this to fix them all up in one go. Reported-by: Folkert van Heusden <folkert@vanheusden.com> Signed-off-by: David S. Miller <davem@davemloft.net> --- net/sched/sch_generic.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 93cd30c..cdcd16f 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -270,6 +270,8 @@ static void dev_watchdog_down(struct net_device *dev) void netif_carrier_on(struct net_device *dev) { if (test_and_clear_bit(__LINK_STATE_NOCARRIER, &dev->state)) { + if (dev->reg_state == NETREG_UNINITIALIZED) + return; linkwatch_fire_event(dev); if (netif_running(dev)) __netdev_watchdog_up(dev); @@ -285,8 +287,11 @@ EXPORT_SYMBOL(netif_carrier_on); */ void netif_carrier_off(struct net_device *dev) { - if (!test_and_set_bit(__LINK_STATE_NOCARRIER, &dev->state)) + if (!test_and_set_bit(__LINK_STATE_NOCARRIER, &dev->state)) { + if (dev->reg_state == NETREG_UNINITIALIZED) + return; linkwatch_fire_event(dev); + } } EXPORT_SYMBOL(netif_carrier_off); -- 1.5.6.5 ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-11-19 23:35 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-10-27 15:00 [2.6.26] OOPS in __linkwatch_run_queue (unable to handle kernel NULL pointer dereference at 00000235) Folkert van Heusden 2008-10-30 9:09 ` Andrew Morton 2008-11-17 8:40 ` [PATCH] " Jarek Poplawski 2008-11-17 8:50 ` David Miller 2008-11-17 9:11 ` Jarek Poplawski 2008-11-17 9:15 ` David Miller 2008-11-19 23:35 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).