LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* net: tx timeouts with skge, 8139too, dmfe drivers/NICs @ 2008-02-25 20:37 Marin Mitov 2008-02-25 20:53 ` Jeff Garzik 0 siblings, 1 reply; 7+ messages in thread From: Marin Mitov @ 2008-02-25 20:37 UTC (permalink / raw) To: linux-kernel Hi all, I experience very rare freezes at heavy outbound traffic (sending ~4GB DVD image to another host(s) on the same LAN) using skge driver (NIC on the mobo) as well as (recently tested) using rtl8139 or dmfe NICs on the PCI bus. There is a single switch between them (tested with another one just to exclude a faulty switch). skge <--> Marvell 88E8001 chip 8139too <--> Realtek 8136B chip dmfe <--> Davicom DM9102 chip Symptoms are similar: tx timeouts and no more net activity. KDE desktop works, computational programs - work, the machine is usable, but cannot ping, nor can be ping-ed anymore. rmmod && modprobe the respective modules repairs the problem. Simple surfing/e-mailing from it do not trigger the problem. The machine is used as LTSP server for old PCs (as X terminals) (mostly outbound traffic) and is not usable as such due to this problem. The kernel is 2.6.24.2-SMP/x86_32 (PREEMPT or not - NO difference). As far as this happens with 3 different NICs/drivers could it be a problem in the (common for all of them) networking subsystem? As far as many persons are working on this machine only limited testing could be done. Thank you in advance for your suggestions, help (and patches). Regards. Marin Mitov ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: net: tx timeouts with skge, 8139too, dmfe drivers/NICs 2008-02-25 20:37 net: tx timeouts with skge, 8139too, dmfe drivers/NICs Marin Mitov @ 2008-02-25 20:53 ` Jeff Garzik 2008-02-25 21:36 ` Marin Mitov 2008-03-12 11:41 ` Marin Mitov 0 siblings, 2 replies; 7+ messages in thread From: Jeff Garzik @ 2008-02-25 20:53 UTC (permalink / raw) To: Marin Mitov; +Cc: linux-kernel Marin Mitov wrote: > Hi all, > > I experience very rare freezes at heavy outbound traffic > (sending ~4GB DVD image to another host(s) on the same LAN) > using skge driver (NIC on the mobo) as well as (recently tested) > using rtl8139 or dmfe NICs on the PCI bus. There is a single > switch between them (tested with another one just to exclude > a faulty switch). > > skge <--> Marvell 88E8001 chip > 8139too <--> Realtek 8136B chip > dmfe <--> Davicom DM9102 chip > > Symptoms are similar: tx timeouts and no more net activity. > KDE desktop works, computational programs - work, the machine > is usable, but cannot ping, nor can be ping-ed anymore. > rmmod && modprobe the respective modules repairs the problem. > Simple surfing/e-mailing from it do not trigger the problem. > > The machine is used as LTSP server for old PCs (as X terminals) > (mostly outbound traffic) and is not usable as such due to this > problem. > > The kernel is 2.6.24.2-SMP/x86_32 (PREEMPT or not - NO difference). > > As far as this happens with 3 different NICs/drivers could it be > a problem in the (common for all of them) networking subsystem? A TX timeout (like hardware timeouts, in general) is a very generic behavior, with many causes. In general, when you see timeouts with varied hardware and drivers, you're almost always dealing with a problem with interrupt delivery, or a generic system problem, rather than bugs in the network stack or all three drivers. Jeff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: net: tx timeouts with skge, 8139too, dmfe drivers/NICs 2008-02-25 20:53 ` Jeff Garzik @ 2008-02-25 21:36 ` Marin Mitov 2008-02-25 21:42 ` Stephen Hemminger 2008-03-12 11:41 ` Marin Mitov 1 sibling, 1 reply; 7+ messages in thread From: Marin Mitov @ 2008-02-25 21:36 UTC (permalink / raw) To: linux-kernel On Monday 25 February 2008 10:53:01 pm you wrote: > Marin Mitov wrote: > > Hi all, > > > > I experience very rare freezes at heavy outbound traffic > > (sending ~4GB DVD image to another host(s) on the same LAN) > > using skge driver (NIC on the mobo) as well as (recently tested) > > using rtl8139 or dmfe NICs on the PCI bus. There is a single > > switch between them (tested with another one just to exclude > > a faulty switch). > > > > skge <--> Marvell 88E8001 chip > > 8139too <--> Realtek 8136B chip > > dmfe <--> Davicom DM9102 chip > > > > Symptoms are similar: tx timeouts and no more net activity. > > KDE desktop works, computational programs - work, the machine > > is usable, but cannot ping, nor can be ping-ed anymore. > > rmmod && modprobe the respective modules repairs the problem. > > Simple surfing/e-mailing from it do not trigger the problem. > > > > The machine is used as LTSP server for old PCs (as X terminals) > > (mostly outbound traffic) and is not usable as such due to this > > problem. > > > > The kernel is 2.6.24.2-SMP/x86_32 (PREEMPT or not - NO difference). > > > > As far as this happens with 3 different NICs/drivers could it be > > a problem in the (common for all of them) networking subsystem? > > A TX timeout (like hardware timeouts, in general) is a very generic > behavior, with many causes. > > In general, when you see timeouts with varied hardware and drivers, > you're almost always dealing with a problem with interrupt delivery, or All the drivers are using #INTA on PCI bus (no MSI/MSI-X). "problem with interrupt delivery" - you suspect interrupts incorrectly disabled (lost) in the drivers or faulty hardware(motherboard)? > a generic system problem, rather than bugs in the network stack or all "a generic system problem" - bad config or faulty hardware(motherboard)? Where I should look for the problem? Just for info: the system is very stable - uptime (if no power outages) could be a month or more (rebooting for kernel changes or updates). Marin Mitov > three drivers. > > Jeff > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: net: tx timeouts with skge, 8139too, dmfe drivers/NICs 2008-02-25 21:36 ` Marin Mitov @ 2008-02-25 21:42 ` Stephen Hemminger 2008-02-25 22:09 ` Marin Mitov 0 siblings, 1 reply; 7+ messages in thread From: Stephen Hemminger @ 2008-02-25 21:42 UTC (permalink / raw) To: Marin Mitov; +Cc: linux-kernel On Mon, 25 Feb 2008 23:36:06 +0200 Marin Mitov <mitov@issp.bas.bg> wrote: > On Monday 25 February 2008 10:53:01 pm you wrote: > > Marin Mitov wrote: > > > Hi all, > > > > > > I experience very rare freezes at heavy outbound traffic > > > (sending ~4GB DVD image to another host(s) on the same LAN) > > > using skge driver (NIC on the mobo) as well as (recently tested) > > > using rtl8139 or dmfe NICs on the PCI bus. There is a single > > > switch between them (tested with another one just to exclude > > > a faulty switch). > > > > > > skge <--> Marvell 88E8001 chip > > > 8139too <--> Realtek 8136B chip > > > dmfe <--> Davicom DM9102 chip > > > > > > Symptoms are similar: tx timeouts and no more net activity. > > > KDE desktop works, computational programs - work, the machine > > > is usable, but cannot ping, nor can be ping-ed anymore. > > > rmmod && modprobe the respective modules repairs the problem. > > > Simple surfing/e-mailing from it do not trigger the problem. > > > > > > The machine is used as LTSP server for old PCs (as X terminals) > > > (mostly outbound traffic) and is not usable as such due to this > > > problem. > > > > > > The kernel is 2.6.24.2-SMP/x86_32 (PREEMPT or not - NO difference). > > > > > > As far as this happens with 3 different NICs/drivers could it be > > > a problem in the (common for all of them) networking subsystem? > > > > A TX timeout (like hardware timeouts, in general) is a very generic > > behavior, with many causes. > > > > In general, when you see timeouts with varied hardware and drivers, > > you're almost always dealing with a problem with interrupt delivery, or > > All the drivers are using #INTA on PCI bus (no MSI/MSI-X). > > "problem with interrupt delivery" - you suspect interrupts incorrectly > disabled (lost) in the drivers or faulty hardware(motherboard)? > > > a generic system problem, rather than bugs in the network stack or all > > "a generic system problem" - bad config or faulty hardware(motherboard)? > > Where I should look for the problem? > > Just for info: the system is very stable - uptime (if no power outages) could > be a month or more (rebooting for kernel changes or updates). > > Marin Mitov Make sure the interrupt is showing up as level triggered in /proc/interrupts. The BIOS may be configuring it as edge-triggered and that won't work with Ethernet drivers that use NAPI. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: net: tx timeouts with skge, 8139too, dmfe drivers/NICs 2008-02-25 21:42 ` Stephen Hemminger @ 2008-02-25 22:09 ` Marin Mitov 2008-02-25 22:57 ` Stephen Hemminger 0 siblings, 1 reply; 7+ messages in thread From: Marin Mitov @ 2008-02-25 22:09 UTC (permalink / raw) To: Stephen Hemminger; +Cc: linux-kernel Hi Stephen, > Make sure the interrupt is showing up as level triggered in > /proc/interrupts. The BIOS may be configuring it as edge-triggered and that > won't work with Ethernet drivers that use NAPI. for: skge <--> Marvell 88E8001 chip cat /proc/interrupts gives (AMD64 X2 SMP): CPU0 CPU1 21: 11691000 11933174 IO-APIC-fasteoi eth0 It is neither IO-APIC-edge, nor IO-APIC-level. Could it be the problem? Marin Mitov ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: net: tx timeouts with skge, 8139too, dmfe drivers/NICs 2008-02-25 22:09 ` Marin Mitov @ 2008-02-25 22:57 ` Stephen Hemminger 0 siblings, 0 replies; 7+ messages in thread From: Stephen Hemminger @ 2008-02-25 22:57 UTC (permalink / raw) To: Marin Mitov; +Cc: linux-kernel On Tue, 26 Feb 2008 00:09:46 +0200 Marin Mitov <mitov@issp.bas.bg> wrote: > Hi Stephen, > > > Make sure the interrupt is showing up as level triggered in > > /proc/interrupts. The BIOS may be configuring it as edge-triggered and that > > won't work with Ethernet drivers that use NAPI. > > for: skge <--> Marvell 88E8001 chip > cat /proc/interrupts gives (AMD64 X2 SMP): > CPU0 CPU1 > 21: 11691000 11933174 IO-APIC-fasteoi eth0 > > It is neither IO-APIC-edge, nor IO-APIC-level. > > Could it be the problem? > > Marin Mitov No. that isn't the problem. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: net: tx timeouts with skge, 8139too, dmfe drivers/NICs 2008-02-25 20:53 ` Jeff Garzik 2008-02-25 21:36 ` Marin Mitov @ 2008-03-12 11:41 ` Marin Mitov 1 sibling, 0 replies; 7+ messages in thread From: Marin Mitov @ 2008-03-12 11:41 UTC (permalink / raw) To: Jeff Garzik; +Cc: linux-kernel On Monday 25 February 2008 10:53:01 pm you wrote: > > As far as this happens with 3 different NICs/drivers could it be > > a problem in the (common for all of them) networking subsystem? > > A TX timeout (like hardware timeouts, in general) is a very generic > behavior, with many causes. > > In general, when you see timeouts with varied hardware and drivers, > you're almost always dealing with a problem with interrupt delivery, or > a generic system problem, rather than bugs in the network stack or all > three drivers. Well, this gave me a direction of research. Using printk in various parts of skge driver, as well as modifying it to collect different statistics (used via ethtool -S eth0), the following observations had been made when it freezes: 1. interrupts are generated (status register shows there are pending interrupts and they are NOT masked), but irq_handler is NOT invoked. 2. Looking on the cat /proc/interrups shows that when skge is working both CPUs receive any IRQs. When skge freezes NO CPU receives skge's interrupts, CPU[0] receives any others IRQs, but skge's, CPU[1] do not receive any IRQ above the line (see bellow), but receives LOC: and RES: below the line. #cat /proc/interrups CPU0 CPU1 0: 85 1 IO-APIC-edge timer 1: 34078 9 IO-APIC-edge i8042 6: 1 4 IO-APIC-edge floppy 7: 216 1 IO-APIC-edge parport0 8: 0 1 IO-APIC-edge rtc 9: 0 0 IO-APIC-fasteoi acpi 12: 893003 1390080 IO-APIC-edge i8042 14: 59682 286628 IO-APIC-edge ide0 15: 5458527 12 IO-APIC-edge ide1 16: 60547054 1 IO-APIC-fasteoi mga@pci:0000:01:00.0 17: 1634623 914447 IO-APIC-fasteoi sata_via 18: 7768 7 IO-APIC-fasteoi sata_promise 19: 0 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4, uhci_hcd:usb5 20: 535380 1 IO-APIC-fasteoi VIA8237 21: 30780380 31448992 IO-APIC-fasteoi eth0 ---------line added by me---------------------------------- NMI: 0 0 Non-maskable interrupts LOC: 154311126 154736178 Local timer interrupts RES: 1325239 2423719 Rescheduling interrupts CAL: 40893 456 function call interrupts TLB: 52651 29184 TLB shootdowns TRM: 0 0 Thermal event interrupts SPU: 0 0 Spurious interrupts ERR: 0 MIS: 0 That looks like IRQs are somehow disabled (at IO-APIC/LAPIC?) at some priority and bellow. Here is the place to say that after freezing, ifconfig down/up (+routing info) does NOT solve the problem, while rmmod/modprobe the driver, makes it work again. So, I moved the functions request_irq()/free_irq() from driver's probe()/release() methods to open()/stop() methods. Thus modified, when skge freezes, ifconfig down/up makes it work again (no need to rmmod/modprobe). That makes me think that somehow skge's IRQ is disabled OUT of the driver and free_irq()/request_irq() clears the problem. Am I wrong? Could it be possible? How could this happen? Any comments/suggestions/patches wellcome. Regards Marin Mitov ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-03-12 11:39 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-02-25 20:37 net: tx timeouts with skge, 8139too, dmfe drivers/NICs Marin Mitov 2008-02-25 20:53 ` Jeff Garzik 2008-02-25 21:36 ` Marin Mitov 2008-02-25 21:42 ` Stephen Hemminger 2008-02-25 22:09 ` Marin Mitov 2008-02-25 22:57 ` Stephen Hemminger 2008-03-12 11:41 ` Marin Mitov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).