LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [BUG] linux-2.6.28-rc3 regression: IRQ smp_affinities not respected
@ 2008-11-05  6:33 Eric Dumazet
  2008-11-05  6:45 ` Michael Chan
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2008-11-05  6:33 UTC (permalink / raw)
  To: linux kernel, Linux Netdev List

Hi all

One more problem it seems on 2.6.28-rc3

I wanted to get maximal throughput from my machine on a network bench with 3 Gigabit
links delivering 600.000 packets per second, so I tried to play with smp_affinity to
dedicate one CPU for each NIC.

It worked with 2.6.27, so there is a regression on this part.

Unfortunatly, git bisect is nearly impossible for me as after two steps
the resulting kernel doesnt event boot on this machine (same for oprofile
regression I mentioned earlier)

# grep eth1 /proc/interrupts
 45:      20425      20418      20445      20441      20345      20349      20384      20397   PCI-MSI-edge      eth1
# cat /proc/irq/45/smp_affinity
ff
# echo 1 >/proc/irq/45/smp_affinity
# cat /proc/irq/45/smp_affinity
01
# grep eth1 /proc/interrupts
 45:      20928      20920      20943      20940      20845      20847      20887      20898   PCI-MSI-edge      eth1
# grep eth1 /proc/interrupts
 45:      21037      21030      21053      21049      20953      20956      20997      21007   PCI-MSI-edge      eth1
# grep eth1 /proc/interrupts
 45:      21141      21134      21156      21154      21057      21059      21101      21110   PCI-MSI-edge      eth1

You can see interrupts keep being spreaded on all CPUS instead of CPU0 only.

That *kills* performance on high end routers and servers.

07:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12)
        Subsystem: Hewlett-Packard Company NC373i Integrated Multifunction Gigabit Server Adapter
        Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 45
        Memory at fa000000 (64-bit, non-prefetchable) [size=32M]
        [virtual] Expansion ROM at d1000000 [disabled] [size=16K]
        Capabilities: [40] PCI-X non-bridge device.
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable+

bnx2 driver

Thanks

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] linux-2.6.28-rc3 regression: IRQ smp_affinities not respected
  2008-11-05  6:33 [BUG] linux-2.6.28-rc3 regression: IRQ smp_affinities not respected Eric Dumazet
@ 2008-11-05  6:45 ` Michael Chan
  2008-11-05  8:09   ` Eric Dumazet
  2008-11-05  9:58   ` David Miller
  0 siblings, 2 replies; 10+ messages in thread
From: Michael Chan @ 2008-11-05  6:45 UTC (permalink / raw)
  To: 'Eric Dumazet', linux kernel, Linux Netdev List

Eric Dumazet wrote:

> Hi all
>
> One more problem it seems on 2.6.28-rc3
>
> I wanted to get maximal throughput from my machine on a
> network bench with 3 Gigabit
> links delivering 600.000 packets per second, so I tried to
> play with smp_affinity to
> dedicate one CPU for each NIC.
>
> It worked with 2.6.27, so there is a regression on this part.

I believe this may be the patch that broke it:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ce6fce4295ba727b36fdc73040e444bd1aae64cd

I don't remember all the details, but the Broadcom 5708 chip is
affected because it does not support MSI per-vector masking.

One way to get around is to disable MSI with bnx2 parameter
disable_msi=1.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] linux-2.6.28-rc3 regression: IRQ smp_affinities not respected
  2008-11-05  6:45 ` Michael Chan
@ 2008-11-05  8:09   ` Eric Dumazet
  2008-11-06  1:46     ` Michael Chan
  2008-11-26  9:01     ` Eric Dumazet
  2008-11-05  9:58   ` David Miller
  1 sibling, 2 replies; 10+ messages in thread
From: Eric Dumazet @ 2008-11-05  8:09 UTC (permalink / raw)
  To: ut; +Cc: linux kernel, Linux Netdev List

Michael Chan a écrit :
> Eric Dumazet wrote:
> 
>> Hi all
>>
>> One more problem it seems on 2.6.28-rc3
>>
>> I wanted to get maximal throughput from my machine on a
>> network bench with 3 Gigabit
>> links delivering 600.000 packets per second, so I tried to
>> play with smp_affinity to
>> dedicate one CPU for each NIC.
>>
>> It worked with 2.6.27, so there is a regression on this part.
> 
> I believe this may be the patch that broke it:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ce6fce4295ba727b36fdc73040e444bd1aae64cd
> 
> I don't remember all the details, but the Broadcom 5708 chip is
> affected because it does not support MSI per-vector masking.
> 
> One way to get around is to disable MSI with bnx2 parameter
> disable_msi=1.
> 

Thanks Michael for your fast answer


I tried this MSI disabling and yes, it now works.

 16:      42726        128        105        106         89         89        145        152   IO-APIC-fasteoi   uhci_hcd:usb1, eth0, eth1

But I now have same interrupt used on two NIC, so I cannot split the load on two CPUS, unless
using a smp_affinity with 2 CPUS on it. Not exactly what I wanted to do..

Do you know if BCM5715S (tg3 driver) have same problem ?

03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12)
        Subsystem: Hewlett-Packard Company NC373i Integrated Multifunction Gigabit Server Adapter
        Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 16
        Memory at f6000000 (64-bit, non-prefetchable) [size=32M]
        [virtual] Expansion ROM at d1300000 [disabled] [size=16K]
        Capabilities: [40] PCI-X non-bridge device.
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-

07:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12)
        Subsystem: Hewlett-Packard Company NC373i Integrated Multifunction Gigabit Server Adapter
        Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 16
        Memory at fa000000 (64-bit, non-prefetchable) [size=32M]
        [virtual] Expansion ROM at d1000000 [disabled] [size=16K]
        Capabilities: [40] PCI-X non-bridge device.
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-

14:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5715S Gigabit Ethernet (rev a3)
        Subsystem: Hewlett-Packard Company NC326m PCIe Dual Port Adapter
        Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 46
        Memory at fdff0000 (64-bit, non-prefetchable) [size=64K]
        Memory at fdfe0000 (64-bit, non-prefetchable) [size=64K]
        [virtual] Expansion ROM at d1200000 [disabled] [size=128K]
        Capabilities: [40] PCI-X non-bridge device.
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable+

14:04.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5715S Gigabit Ethernet (rev a3)
        Subsystem: Hewlett-Packard Company NC326m PCIe Dual Port Adapter
        Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 16
        Memory at fdfd0000 (64-bit, non-prefetchable) [size=64K]
        Memory at fdfc0000 (64-bit, non-prefetchable) [size=64K]
        [virtual] Expansion ROM at d1220000 [disabled] [size=128K]
        Capabilities: [40] PCI-X non-bridge device.
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] linux-2.6.28-rc3 regression: IRQ smp_affinities not respected
  2008-11-05  6:45 ` Michael Chan
  2008-11-05  8:09   ` Eric Dumazet
@ 2008-11-05  9:58   ` David Miller
  1 sibling, 0 replies; 10+ messages in thread
From: David Miller @ 2008-11-05  9:58 UTC (permalink / raw)
  To: mchan; +Cc: dada1, linux-kernel, netdev

From: "Michael Chan" <mchan@broadcom.com>
Date: Tue, 4 Nov 2008 22:45:43 -0800

> I believe this may be the patch that broke it:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ce6fce4295ba727b36fdc73040e444bd1aae64cd
> 
> I don't remember all the details, but the Broadcom 5708 chip is
> affected because it does not support MSI per-vector masking.
> 
> One way to get around is to disable MSI with bnx2 parameter
> disable_msi=1.

On first glance my reaction is that I think that patch should be
reverted, given what the tradeoff appears to be.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] linux-2.6.28-rc3 regression: IRQ smp_affinities not respected
  2008-11-05  8:09   ` Eric Dumazet
@ 2008-11-06  1:46     ` Michael Chan
  2008-11-26  9:01     ` Eric Dumazet
  1 sibling, 0 replies; 10+ messages in thread
From: Michael Chan @ 2008-11-06  1:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux kernel, Linux Netdev List


On Wed, 2008-11-05 at 00:09 -0800, Eric Dumazet wrote:
> Thanks Michael for your fast answer
> 
> 
> I tried this MSI disabling and yes, it now works.
> 
>  16:      42726        128        105        106         89         89
> 145        152   IO-APIC-fasteoi   uhci_hcd:usb1, eth0, eth1
> 
> But I now have same interrupt used on two NIC, so I cannot split the
> load on two CPUS, unless
> using a smp_affinity with 2 CPUS on it. Not exactly what I wanted to
> do..
> 
> Do you know if BCM5715S (tg3 driver) have same problem ?

Eric, I believe the 5715 should be OK.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] linux-2.6.28-rc3 regression: IRQ smp_affinities not respected
  2008-11-05  8:09   ` Eric Dumazet
  2008-11-06  1:46     ` Michael Chan
@ 2008-11-26  9:01     ` Eric Dumazet
  2008-11-26 17:14       ` Michael Chan
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2008-11-26  9:01 UTC (permalink / raw)
  To: ut; +Cc: linux kernel, Linux Netdev List

Eric Dumazet a écrit :
> Michael Chan a écrit :
>> I believe this may be the patch that broke it:
>>
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ce6fce4295ba727b36fdc73040e444bd1aae64cd 
>>
>>
>> I don't remember all the details, but the Broadcom 5708 chip is
>> affected because it does not support MSI per-vector masking.
>>
>> One way to get around is to disable MSI with bnx2 parameter
>> disable_msi=1.
>>

> 
> I tried this MSI disabling and yes, it now works.
> 
> 16:      42726        128        105        106         89         
> 89        145        152   IO-APIC-fasteoi   uhci_hcd:usb1, eth0, eth1
> 

I believe the bnx2 driver doesnt work at all if !disable_msi (default setting)

Doing a "echo 0 >/sys/devices/system/cpu/cpu1/online" just freeze network

No messages logged

If loaded with disable_msi=1, the cpu unplug works as expected.

Thats a pretty serious issue.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] linux-2.6.28-rc3 regression: IRQ smp_affinities not respected
  2008-11-26  9:01     ` Eric Dumazet
@ 2008-11-26 17:14       ` Michael Chan
  2008-12-01 23:23         ` Michael Chan
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Chan @ 2008-11-26 17:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux kernel, Linux Netdev List


On Wed, 2008-11-26 at 01:01 -0800, Eric Dumazet wrote:
> Eric Dumazet a écrit :
> > Michael Chan a écrit :
> >> I believe this may be the patch that broke it:
> >>
> >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ce6fce4295ba727b36fdc73040e444bd1aae64cd 
> >>
> >>
> >> I don't remember all the details, but the Broadcom 5708 chip is
> >> affected because it does not support MSI per-vector masking.
> >>
> >> One way to get around is to disable MSI with bnx2 parameter
> >> disable_msi=1.
> >>
> 
> > 
> > I tried this MSI disabling and yes, it now works.
> > 
> > 16:      42726        128        105        106         89         
> > 89        145        152   IO-APIC-fasteoi   uhci_hcd:usb1, eth0, eth1
> > 
> 
> I believe the bnx2 driver doesnt work at all if !disable_msi (default setting)
> 
> Doing a "echo 0 >/sys/devices/system/cpu/cpu1/online" just freeze network
> 
> No messages logged
> 
> If loaded with disable_msi=1, the cpu unplug works as expected.
> 
> Thats a pretty serious issue.
>
Yes, that's the same issue and it is serious.  If MSI is being delivered
to CPU 1 and you then take CPU 1 offline, the MSI will not be delivered
to another CPU.

I think I can detect this problem in bnx2_timer() and try to recover.
I'll post a patch when I have something ready.  Thanks.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] linux-2.6.28-rc3 regression: IRQ smp_affinities not respected
  2008-11-26 17:14       ` Michael Chan
@ 2008-12-01 23:23         ` Michael Chan
  2008-12-02  6:04           ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Chan @ 2008-12-01 23:23 UTC (permalink / raw)
  To: Eric Dumazet, davem; +Cc: linux kernel, Linux Netdev List


On Wed, 2008-11-26 at 09:14 -0800, Michael Chan wrote:
> On Wed, 2008-11-26 at 01:01 -0800, Eric Dumazet wrote:
> > Eric Dumazet a écrit :
> > > Michael Chan a écrit :
> > >> I believe this may be the patch that broke it:
> > >>
> > >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ce6fce4295ba727b36fdc73040e444bd1aae64cd 
> > >>
> > >>
> > >> I don't remember all the details, but the Broadcom 5708 chip is
> > >> affected because it does not support MSI per-vector masking.
> > >>
> > >> One way to get around is to disable MSI with bnx2 parameter
> > >> disable_msi=1.
> > >>
> > 
> > > 
> > > I tried this MSI disabling and yes, it now works.
> > > 
> > > 16:      42726        128        105        106         89         
> > > 89        145        152   IO-APIC-fasteoi   uhci_hcd:usb1, eth0, eth1
> > > 
> > 
> > I believe the bnx2 driver doesnt work at all if !disable_msi (default setting)
> > 
> > Doing a "echo 0 >/sys/devices/system/cpu/cpu1/online" just freeze network
> > 
> > No messages logged
> > 
> > If loaded with disable_msi=1, the cpu unplug works as expected.
> > 
> > Thats a pretty serious issue.
> >
> Yes, that's the same issue and it is serious.  If MSI is being delivered
> to CPU 1 and you then take CPU 1 offline, the MSI will not be delivered
> to another CPU.
> 
> I think I can detect this problem in bnx2_timer() and try to recover.
> I'll post a patch when I have something ready.  Thanks.

[PATCH] bnx2: Add workaround to handle missed MSI.

The bnx2 chips do not support per MSI vector masking.  On 5706/5708, new MSI
address/data are stored only when the MSI enable bit is toggled.  As a result,
SMP affinity no longer works in the latest kernel.  A more serious problem is
that the driver will no longer receive interrupts when the MSI receiving CPU
goes offline.

The workaround in this patch only addresses the problem of CPU going offline.
When that happens, the driver's timer function will detect that it is making
no forward progress on pending interrupt events and will recover from it.

Eric Dumazet reported the problem.

We also found that if an interrupt is internally asserted while MSI and INTA
are disabled, the chip will end up in the same state after MSI is re-enabled.
The same workaround is needed for this problem. 

Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/bnx2.c |   35 ++++++++++++++++++++++++++++++++---
 drivers/net/bnx2.h |    6 ++++++
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 182f241..0e2218d 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -3147,6 +3147,28 @@ bnx2_has_work(struct bnx2_napi *bnapi)
 	return 0;
 }
 
+static void
+bnx2_chk_missed_msi(struct bnx2 *bp)
+{
+	struct bnx2_napi *bnapi = &bp->bnx2_napi[0];
+	u32 msi_ctrl;
+
+	if (bnx2_has_work(bnapi)) {
+		msi_ctrl = REG_RD(bp, BNX2_PCICFG_MSI_CONTROL);
+		if (!(msi_ctrl & BNX2_PCICFG_MSI_CONTROL_ENABLE))
+			return;
+
+		if (bnapi->last_status_idx == bp->idle_chk_status_idx) {
+			REG_WR(bp, BNX2_PCICFG_MSI_CONTROL, msi_ctrl &
+			       ~BNX2_PCICFG_MSI_CONTROL_ENABLE);
+			REG_WR(bp, BNX2_PCICFG_MSI_CONTROL, msi_ctrl);
+			bnx2_msi(bp->irq_tbl[0].vector, bnapi);
+		}
+	}
+
+	bp->idle_chk_status_idx = bnapi->last_status_idx;
+}
+
 static void bnx2_poll_link(struct bnx2 *bp, struct bnx2_napi *bnapi)
 {
 	struct status_block *sblk = bnapi->status_blk.msi;
@@ -3221,14 +3243,15 @@ static int bnx2_poll(struct napi_struct *napi, int budget)
 
 		work_done = bnx2_poll_work(bp, bnapi, work_done, budget);
 
-		if (unlikely(work_done >= budget))
-			break;
-
 		/* bnapi->last_status_idx is used below to tell the hw how
 		 * much work has been processed, so we must read it before
 		 * checking for more work.
 		 */
 		bnapi->last_status_idx = sblk->status_idx;
+
+		if (unlikely(work_done >= budget))
+			break;
+
 		rmb();
 		if (likely(!bnx2_has_work(bnapi))) {
 			netif_rx_complete(bp->dev, napi);
@@ -4581,6 +4604,8 @@ bnx2_init_chip(struct bnx2 *bp)
 	for (i = 0; i < BNX2_MAX_MSIX_VEC; i++)
 		bp->bnx2_napi[i].last_status_idx = 0;
 
+	bp->idle_chk_status_idx = 0xffff;
+
 	bp->rx_mode = BNX2_EMAC_RX_MODE_SORT_MODE;
 
 	/* Set up how to generate a link change interrupt. */
@@ -5729,6 +5754,10 @@ bnx2_timer(unsigned long data)
 	if (atomic_read(&bp->intr_sem) != 0)
 		goto bnx2_restart_timer;
 
+	if ((bp->flags & (BNX2_FLAG_USING_MSI | BNX2_FLAG_ONE_SHOT_MSI)) ==
+	     BNX2_FLAG_USING_MSI)
+		bnx2_chk_missed_msi(bp);
+
 	bnx2_send_heart_beat(bp);
 
 	bp->stats_blk->stat_FwRxDrop =
diff --git a/drivers/net/bnx2.h b/drivers/net/bnx2.h
index 0763108..2f43c45 100644
--- a/drivers/net/bnx2.h
+++ b/drivers/net/bnx2.h
@@ -378,6 +378,9 @@ struct l2_fhdr {
  *  pci_config_l definition
  *  offset: 0000
  */
+#define BNX2_PCICFG_MSI_CONTROL				0x00000058
+#define BNX2_PCICFG_MSI_CONTROL_ENABLE			 (1L<<16)
+
 #define BNX2_PCICFG_MISC_CONFIG				0x00000068
 #define BNX2_PCICFG_MISC_CONFIG_TARGET_BYTE_SWAP	 (1L<<2)
 #define BNX2_PCICFG_MISC_CONFIG_TARGET_MB_WORD_SWAP	 (1L<<3)
@@ -6882,6 +6885,9 @@ struct bnx2 {
 
 	u8			num_tx_rings;
 	u8			num_rx_rings;
+
+	u32			idle_chk_status_idx;
+
 };
 
 #define REG_RD(bp, offset)					\
-- 
1.5.6.GIT





^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [BUG] linux-2.6.28-rc3 regression: IRQ smp_affinities not respected
  2008-12-01 23:23         ` Michael Chan
@ 2008-12-02  6:04           ` Eric Dumazet
  2008-12-03  8:36             ` David Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2008-12-02  6:04 UTC (permalink / raw)
  To: Michael Chan; +Cc: davem, linux kernel, Linux Netdev List

Michael Chan a écrit :
> On Wed, 2008-11-26 at 09:14 -0800, Michael Chan wrote:
>> On Wed, 2008-11-26 at 01:01 -0800, Eric Dumazet wrote:
>>> Eric Dumazet a écrit :
>>>> Michael Chan a écrit :
>>>>> I believe this may be the patch that broke it:
>>>>>
>>>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ce6fce4295ba727b36fdc73040e444bd1aae64cd 
>>>>>
>>>>>
>>>>> I don't remember all the details, but the Broadcom 5708 chip is
>>>>> affected because it does not support MSI per-vector masking.
>>>>>
>>>>> One way to get around is to disable MSI with bnx2 parameter
>>>>> disable_msi=1.
>>>>>
>>>> I tried this MSI disabling and yes, it now works.
>>>>
>>>> 16:      42726        128        105        106         89         
>>>> 89        145        152   IO-APIC-fasteoi   uhci_hcd:usb1, eth0, eth1
>>>>
>>> I believe the bnx2 driver doesnt work at all if !disable_msi (default setting)
>>>
>>> Doing a "echo 0 >/sys/devices/system/cpu/cpu1/online" just freeze network
>>>
>>> No messages logged
>>>
>>> If loaded with disable_msi=1, the cpu unplug works as expected.
>>>
>>> Thats a pretty serious issue.
>>>
>> Yes, that's the same issue and it is serious.  If MSI is being delivered
>> to CPU 1 and you then take CPU 1 offline, the MSI will not be delivered
>> to another CPU.
>>
>> I think I can detect this problem in bnx2_timer() and try to recover.
>> I'll post a patch when I have something ready.  Thanks.
> 
> [PATCH] bnx2: Add workaround to handle missed MSI.
> 
> The bnx2 chips do not support per MSI vector masking.  On 5706/5708, new MSI
> address/data are stored only when the MSI enable bit is toggled.  As a result,
> SMP affinity no longer works in the latest kernel.  A more serious problem is
> that the driver will no longer receive interrupts when the MSI receiving CPU
> goes offline.
> 
> The workaround in this patch only addresses the problem of CPU going offline.
> When that happens, the driver's timer function will detect that it is making
> no forward progress on pending interrupt events and will recover from it.
> 
> Eric Dumazet reported the problem.
> 
> We also found that if an interrupt is internally asserted while MSI and INTA
> are disabled, the chip will end up in the same state after MSI is re-enabled.
> The same workaround is needed for this problem. 
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>
> ---
>  drivers/net/bnx2.c |   35 ++++++++++++++++++++++++++++++++---
>  drivers/net/bnx2.h |    6 ++++++
>  2 files changed, 38 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> index 182f241..0e2218d 100644
> --- a/drivers/net/bnx2.c
> +++ b/drivers/net/bnx2.c
> @@ -3147,6 +3147,28 @@ bnx2_has_work(struct bnx2_napi *bnapi)
>  	return 0;
>  }
>  
> +static void
> +bnx2_chk_missed_msi(struct bnx2 *bp)
> +{
> +	struct bnx2_napi *bnapi = &bp->bnx2_napi[0];
> +	u32 msi_ctrl;
> +
> +	if (bnx2_has_work(bnapi)) {
> +		msi_ctrl = REG_RD(bp, BNX2_PCICFG_MSI_CONTROL);
> +		if (!(msi_ctrl & BNX2_PCICFG_MSI_CONTROL_ENABLE))
> +			return;
> +
> +		if (bnapi->last_status_idx == bp->idle_chk_status_idx) {
> +			REG_WR(bp, BNX2_PCICFG_MSI_CONTROL, msi_ctrl &
> +			       ~BNX2_PCICFG_MSI_CONTROL_ENABLE);
> +			REG_WR(bp, BNX2_PCICFG_MSI_CONTROL, msi_ctrl);
> +			bnx2_msi(bp->irq_tbl[0].vector, bnapi);
> +		}
> +	}
> +
> +	bp->idle_chk_status_idx = bnapi->last_status_idx;
> +}
> +
>  static void bnx2_poll_link(struct bnx2 *bp, struct bnx2_napi *bnapi)
>  {
>  	struct status_block *sblk = bnapi->status_blk.msi;
> @@ -3221,14 +3243,15 @@ static int bnx2_poll(struct napi_struct *napi, int budget)
>  
>  		work_done = bnx2_poll_work(bp, bnapi, work_done, budget);
>  
> -		if (unlikely(work_done >= budget))
> -			break;
> -
>  		/* bnapi->last_status_idx is used below to tell the hw how
>  		 * much work has been processed, so we must read it before
>  		 * checking for more work.
>  		 */
>  		bnapi->last_status_idx = sblk->status_idx;
> +
> +		if (unlikely(work_done >= budget))
> +			break;
> +
>  		rmb();
>  		if (likely(!bnx2_has_work(bnapi))) {
>  			netif_rx_complete(bp->dev, napi);
> @@ -4581,6 +4604,8 @@ bnx2_init_chip(struct bnx2 *bp)
>  	for (i = 0; i < BNX2_MAX_MSIX_VEC; i++)
>  		bp->bnx2_napi[i].last_status_idx = 0;
>  
> +	bp->idle_chk_status_idx = 0xffff;
> +
>  	bp->rx_mode = BNX2_EMAC_RX_MODE_SORT_MODE;
>  
>  	/* Set up how to generate a link change interrupt. */
> @@ -5729,6 +5754,10 @@ bnx2_timer(unsigned long data)
>  	if (atomic_read(&bp->intr_sem) != 0)
>  		goto bnx2_restart_timer;
>  
> +	if ((bp->flags & (BNX2_FLAG_USING_MSI | BNX2_FLAG_ONE_SHOT_MSI)) ==
> +	     BNX2_FLAG_USING_MSI)
> +		bnx2_chk_missed_msi(bp);
> +
>  	bnx2_send_heart_beat(bp);
>  
>  	bp->stats_blk->stat_FwRxDrop =
> diff --git a/drivers/net/bnx2.h b/drivers/net/bnx2.h
> index 0763108..2f43c45 100644
> --- a/drivers/net/bnx2.h
> +++ b/drivers/net/bnx2.h
> @@ -378,6 +378,9 @@ struct l2_fhdr {
>   *  pci_config_l definition
>   *  offset: 0000
>   */
> +#define BNX2_PCICFG_MSI_CONTROL				0x00000058
> +#define BNX2_PCICFG_MSI_CONTROL_ENABLE			 (1L<<16)
> +
>  #define BNX2_PCICFG_MISC_CONFIG				0x00000068
>  #define BNX2_PCICFG_MISC_CONFIG_TARGET_BYTE_SWAP	 (1L<<2)
>  #define BNX2_PCICFG_MISC_CONFIG_TARGET_MB_WORD_SWAP	 (1L<<3)
> @@ -6882,6 +6885,9 @@ struct bnx2 {
>  
>  	u8			num_tx_rings;
>  	u8			num_rx_rings;
> +
> +	u32			idle_chk_status_idx;
> +
>  };
>  
>  #define REG_RD(bp, offset)					\

Thanks a lot Michael

I tested your patch on my machine.

I confirm CPU unplug/hotplug works now (If correcting a bug in oprofile first)
NIC doesnt freeze anymore

Eric


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] linux-2.6.28-rc3 regression: IRQ smp_affinities not respected
  2008-12-02  6:04           ` Eric Dumazet
@ 2008-12-03  8:36             ` David Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2008-12-03  8:36 UTC (permalink / raw)
  To: dada1; +Cc: mchan, linux-kernel, netdev

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Tue, 02 Dec 2008 07:04:48 +0100

> Michael Chan a écrit :
> > [PATCH] bnx2: Add workaround to handle missed MSI.
 ...
> Thanks a lot Michael
> 
> I tested your patch on my machine.
> 
> I confirm CPU unplug/hotplug works now (If correcting a bug in oprofile first)
> NIC doesnt freeze anymore

Applied to net-2.6, thanks everyone.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-12-03 13:24 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-05  6:33 [BUG] linux-2.6.28-rc3 regression: IRQ smp_affinities not respected Eric Dumazet
2008-11-05  6:45 ` Michael Chan
2008-11-05  8:09   ` Eric Dumazet
2008-11-06  1:46     ` Michael Chan
2008-11-26  9:01     ` Eric Dumazet
2008-11-26 17:14       ` Michael Chan
2008-12-01 23:23         ` Michael Chan
2008-12-02  6:04           ` Eric Dumazet
2008-12-03  8:36             ` David Miller
2008-11-05  9:58   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).