LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* sky2 hangs
@ 2007-02-01 18:55 Thomas Glanzmann
  2007-02-01 19:05 ` Stephen Hemminger
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Thomas Glanzmann @ 2007-02-01 18:55 UTC (permalink / raw)
  To: LKML; +Cc: shemminger, netdev

Hello,
I have a sky2 network card in my intel mac mini. It stops working when I
do havy network load like watching a divx over http/sshfs. However if I
remove the driver module and load it again it works and even the tcp
connection doesn't get shutdown. I automated the above procedure using
a userland watchdog which basically does the same thing and is written
entirely by me, because the traditional watchdog wasn't that reliable
and did a lot of false positives:

        * Look every ten seconds if my default router is pingable (3
          pings, one has to get back).
                If it isn't the case I call network_fix script (it calls the
                script only once after a ping gets lost. To run the script again at least one
                ping has to arrive again)

                (mini) [~] cat /usr/local/sbin/fix_network
                #!/bin/bash

                export PATH=/bin:/usr/bin:/usr/sbin:/sbin

                rmmod sky2
                modprobe sky2
                ifdown eth0
                ifup eth0

                If after that no ping is received from the default
                router for another 90 seconds I tell init to reboot and
                stop feeding the kernel software watchdog.

        * My watchdog also checks if sshd process is running. If it is
          down for more than 100 seconds it reboots the machine, too.

Jan 27 22:35:35 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10)
Jan 27 22:35:35 mini watchdog-tg[4146]: Running fix_network script.
Jan 27 22:38:46 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10)
Jan 27 22:38:46 mini watchdog-tg[4146]: Running fix_network script.
Jan 27 22:44:17 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10)
Jan 27 22:44:17 mini watchdog-tg[4146]: Running fix_network script.
Jan 29 12:00:13 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10)
Jan 29 12:00:13 mini watchdog-tg[4146]: Running fix_network script.
Jan 29 19:18:59 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10)
Jan 29 19:18:59 mini watchdog-tg[4146]: Running fix_network script.
Jan 31 15:56:29 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10)
Jan 31 15:56:29 mini watchdog-tg[4146]: Running fix_network script.
Feb  1 08:56:57 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10)
Feb  1 08:56:57 mini watchdog-tg[4146]: Running fix_network script.

I have a question to this: I wonder why the Linux Kernel (no longer?)
increments the use counter of an ethernet driver (I saw it on sky2 and
e1000) when the interface is up, running and configured? I can unload
the sky2 driver without doing a 'ifconfig eth0 down' beforehand. Could
somone provide me with background on this fact?

With that everything works. If somone is interested in my userland
watchdog, just send me an E-Mail.

@Sam: I can provide you access to my hardware including root access via
the wifi driver so that you can debug this network driver lockup, if you
want to.

        Thomas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sky2 hangs
  2007-02-01 18:55 sky2 hangs Thomas Glanzmann
@ 2007-02-01 19:05 ` Stephen Hemminger
  2007-02-01 19:19   ` Thomas Glanzmann
  2007-02-01 19:07 ` Stephen Hemminger
  2007-02-01 22:46 ` Fagyal Csongor
  2 siblings, 1 reply; 14+ messages in thread
From: Stephen Hemminger @ 2007-02-01 19:05 UTC (permalink / raw)
  To: Thomas Glanzmann; +Cc: LKML, shemminger, netdev

On Thu, 1 Feb 2007 19:55:32 +0100
Thomas Glanzmann <thomas@glanzmann.de> wrote:

> Hello,
> I have a sky2 network card in my intel mac mini. It stops working when I
> do havy network load like watching a divx over http/sshfs. However if I
> remove the driver module and load it again it works and even the tcp
> connection doesn't get shutdown. I automated the above procedure using
> a userland watchdog which basically does the same thing and is written
> entirely by me, because the traditional watchdog wasn't that reliable
> and did a lot of false positives:
> 
>         * Look every ten seconds if my default router is pingable (3
>           pings, one has to get back).
>                 If it isn't the case I call network_fix script (it calls the
>                 script only once after a ping gets lost. To run the script again at least one
>                 ping has to arrive again)
> 
>                 (mini) [~] cat /usr/local/sbin/fix_network
>                 #!/bin/bash
> 
>                 export PATH=/bin:/usr/bin:/usr/sbin:/sbin
> 
>                 rmmod sky2
>                 modprobe sky2
>                 ifdown eth0
>                 ifup eth0
> 
>                 If after that no ping is received from the default
>                 router for another 90 seconds I tell init to reboot and
>                 stop feeding the kernel software watchdog.
> 
>         * My watchdog also checks if sshd process is running. If it is
>           down for more than 100 seconds it reboots the machine, too.
> 
> Jan 27 22:35:35 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10)
> Jan 27 22:35:35 mini watchdog-tg[4146]: Running fix_network script.
> Jan 27 22:38:46 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10)
> Jan 27 22:38:46 mini watchdog-tg[4146]: Running fix_network script.
> Jan 27 22:44:17 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10)
> Jan 27 22:44:17 mini watchdog-tg[4146]: Running fix_network script.
> Jan 29 12:00:13 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10)
> Jan 29 12:00:13 mini watchdog-tg[4146]: Running fix_network script.
> Jan 29 19:18:59 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10)
> Jan 29 19:18:59 mini watchdog-tg[4146]: Running fix_network script.
> Jan 31 15:56:29 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10)
> Jan 31 15:56:29 mini watchdog-tg[4146]: Running fix_network script.
> Feb  1 08:56:57 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10)
> Feb  1 08:56:57 mini watchdog-tg[4146]: Running fix_network script.
> 
> I have a question to this: I wonder why the Linux Kernel (no longer?)
> increments the use counter of an ethernet driver (I saw it on sky2 and
> e1000) when the interface is up, running and configured? I can unload
> the sky2 driver without doing a 'ifconfig eth0 down' beforehand. Could
> somone provide me with background on this fact?

It was intentional in 2.6 to allow interfaces to be hot-removed.
Remember with Internet protocols there is no hard binding (normally)
between address and device and connections should not go down
if link fails.

> 
> With that everything works. If somone is interested in my userland
> watchdog, just send me an E-Mail.

Hopefully, it won't be necessary for long.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sky2 hangs
  2007-02-01 18:55 sky2 hangs Thomas Glanzmann
  2007-02-01 19:05 ` Stephen Hemminger
@ 2007-02-01 19:07 ` Stephen Hemminger
  2007-02-01 19:16   ` Thomas Glanzmann
  2007-02-01 22:46 ` Fagyal Csongor
  2 siblings, 1 reply; 14+ messages in thread
From: Stephen Hemminger @ 2007-02-01 19:07 UTC (permalink / raw)
  To: Thomas Glanzmann; +Cc: LKML, shemminger, netdev

On Thu, 1 Feb 2007 19:55:32 +0100
Thomas Glanzmann <thomas@glanzmann.de> wrote:

> Hello,
> I have a sky2 network card in my intel mac mini. It stops working when I
> do havy network load like watching a divx over http/sshfs.

Is this heavy Tx load (ie your watching movie from mac mini). or Rx load
(you are watching movie on mac mini).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sky2 hangs
  2007-02-01 19:07 ` Stephen Hemminger
@ 2007-02-01 19:16   ` Thomas Glanzmann
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Glanzmann @ 2007-02-01 19:16 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: LKML, shemminger, netdev

Hello Sam,

> Is this heavy Tx load (ie your watching movie from mac mini). or Rx
> load (you are watching movie on mac mini).

it's inbound (Rx) traffic. Watching a Movie, git pull from linus, or scp
kernel tar tree from my laptop to my mac mini.

        Thomas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sky2 hangs
  2007-02-01 19:05 ` Stephen Hemminger
@ 2007-02-01 19:19   ` Thomas Glanzmann
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Glanzmann @ 2007-02-01 19:19 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: LKML, shemminger, netdev

Hello Stephen,

> It was intentional in 2.6 to allow interfaces to be hot-removed.
> Remember with Internet protocols there is no hard binding (normally)
> between address and device and connections should not go down if link
> fails.

of course. That makes sense. I just wondered when the change in mind
happened. And actually I like this behaviour.

> > With that everything works. If somone is interested in my userland
> > watchdog, just send me an E-Mail.

> Hopefully, it won't be necessary for long.

So do I.

        Thomas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sky2 hangs
  2007-02-01 18:55 sky2 hangs Thomas Glanzmann
  2007-02-01 19:05 ` Stephen Hemminger
  2007-02-01 19:07 ` Stephen Hemminger
@ 2007-02-01 22:46 ` Fagyal Csongor
  2007-02-01 22:58   ` Stephen Hemminger
  2007-02-02  6:27   ` Thomas Glanzmann
  2 siblings, 2 replies; 14+ messages in thread
From: Fagyal Csongor @ 2007-02-01 22:46 UTC (permalink / raw)
  To: thomas; +Cc: linux-kernel, shemminger, netdev

> Hello,
> I have a sky2 network card in my intel mac mini. It stops working when I
> do havy network load like watching a divx over http/sshfs. However if I
> remove the driver module and load it again it works and even the tcp
> connection doesn't get shutdown. I automated the above procedure using a
> userland watchdog which basically does the same thing and is written
> entirely by me, because the traditional watchdog wasn't that reliable
> and did a lot of false positives:
>
>         * Look every ten seconds if my default router is pingable (3
>           pings, one has to get back).
>                 If it isn't the case I call network_fix script (it calls
> the script only once after a ping gets lost. To run the
> script again at least one ping has to arrive again)
>
>                 (mini) [~] cat /usr/local/sbin/fix_network
>                 #!/bin/bash
>
>                 export PATH=/bin:/usr/bin:/usr/sbin:/sbin
>
>                 rmmod sky2
>                 modprobe sky2
>                 ifdown eth0
>                 ifup eth0
>
>                 If after that no ping is received from the default
> router for another 90 seconds I tell init to reboot and
> stop feeding the kernel software watchdog.
>
>         * My watchdog also checks if sshd process is running. If it is
>           down for more than 100 seconds it reboots the machine, too.
>
> Jan 27 22:35:35 mini watchdog-tg[4146]: No PONG received from
> 192.168.0.3 (failure 1 of 10) Jan 27 22:35:35 mini watchdog-tg[4146]:
> Running fix_network script. Jan 27 22:38:46 mini watchdog-tg[4146]: No
> PONG received from 192.168.0.3 (failure 1 of 10) Jan 27 22:38:46 mini
> watchdog-tg[4146]: Running fix_network script. Jan 27 22:44:17 mini
> watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10)
> Jan 27 22:44:17 mini watchdog-tg[4146]: Running fix_network script. Jan
> 29 12:00:13 mini watchdog-tg[4146]: No PONG received from 192.168.0.3
> (failure 1 of 10) Jan 29 12:00:13 mini watchdog-tg[4146]: Running
> fix_network script. Jan 29 19:18:59 mini watchdog-tg[4146]: No PONG
> received from 192.168.0.3 (failure 1 of 10) Jan 29 19:18:59 mini
> watchdog-tg[4146]: Running fix_network script. Jan 31 15:56:29 mini
> watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10)
> Jan 31 15:56:29 mini watchdog-tg[4146]: Running fix_network script. Feb
> 1 08:56:57 mini watchdog-tg[4146]: No PONG received from 192.168.0.3
> (failure 1 of 10) Feb  1 08:56:57 mini watchdog-tg[4146]: Running
> fix_network script.
[...]

I would like to add a few things:

- a previously suggested fix - passing idle=poll to the kernel - did not
work for me at the end
- the locks I have happen very periodically (somewhere around every 22-28
hours), as if the chip would die after a given amount of data transferred;
I know this looks stupid but I thought I might mention it
- I have about 1Mbit/s of (incoming) traffic on this interface: with
short, very high peaks, as there is a MySQL server on the other end,
receiving about 100 queries per second
- unloading the sky2 module totally freezes the computer for me



- Fagzal



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sky2 hangs
  2007-02-01 22:46 ` Fagyal Csongor
@ 2007-02-01 22:58   ` Stephen Hemminger
  2007-02-02  6:31     ` Thomas Glanzmann
  2007-02-02  6:27   ` Thomas Glanzmann
  1 sibling, 1 reply; 14+ messages in thread
From: Stephen Hemminger @ 2007-02-01 22:58 UTC (permalink / raw)
  To: Fagyal Csongor; +Cc: thomas, linux-kernel, netdev

I can reproduce the problem now (on mac mini). Interestingly it seems to whack
the whole ethernet switch when it happens.
> 
> - a previously suggested fix - passing idle=poll to the kernel - did not
> work for me at the end

It is not an MSI or IRQ problem. It is a phy problem (see below).

> - the locks I have happen very periodically (somewhere around every 22-28
> hours), as if the chip would die after a given amount of data transferred;
> I know this looks stupid but I thought I might mention it
> - I have about 1Mbit/s of (incoming) traffic on this interface: with
> short, very high peaks, as there is a MySQL server on the other end,
> receiving about 100 queries per second
> - unloading the sky2 module totally freezes the computer for me

If you do:
	ethtool -r eth0
it cause a PHY reset (renegotiation) and clears the problem.


-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sky2 hangs
  2007-02-01 22:46 ` Fagyal Csongor
  2007-02-01 22:58   ` Stephen Hemminger
@ 2007-02-02  6:27   ` Thomas Glanzmann
  1 sibling, 0 replies; 14+ messages in thread
From: Thomas Glanzmann @ 2007-02-02  6:27 UTC (permalink / raw)
  To: Fagyal Csongor; +Cc: linux-kernel, shemminger, netdev

Hello Fagyal,

> - a previously suggested fix - passing idle=poll to the kernel - did not
> work for me at the end

same for me. I tried the two module parameters and the kernel parameter:

pci=nomsi sky2.disable_msi=1 sky2.idle_timeout=1000

> - the locks I have happen very periodically (somewhere around every 22-28
> hours), as if the chip would die after a given amount of data transferred;
> I know this looks stupid but I thought I might mention it

I had a dedicated server with sky2 which had the same symptoms but I
disabled the onboard sky2 to and added a e100. On my mac mini I can
reproduce it nearly immediately. Just have to scp a kernel tar tree over
and it hangs.

> - unloading the sky2 module totally freezes the computer for me

For me it works pretty good.

        Thomas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sky2 hangs
  2007-02-01 22:58   ` Stephen Hemminger
@ 2007-02-02  6:31     ` Thomas Glanzmann
  2007-02-02 10:44       ` Julien BLACHE
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Glanzmann @ 2007-02-02  6:31 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Fagyal Csongor, linux-kernel, netdev

Hello Stephen,

> I can reproduce the problem now (on mac mini). Interestingly it seems
> to whack the whole ethernet switch when it happens.

wow. I have Linksys wrt54g has 'ethernet switch' and my Snom 320 VoIP
phone still works when the mini network card goes down. On the other
side the wrt54g isn't exactly a switch but more like a bunch of network
cards which use the linux bridging code IIRC.

> If you do:
> 	ethtool -r eth0
> it cause a PHY reset (renegotiation) and clears the problem.

But this isn't related to my problen (on mac mini), is it?

        Thomas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sky2 hangs
  2007-02-02  6:31     ` Thomas Glanzmann
@ 2007-02-02 10:44       ` Julien BLACHE
  2007-02-02 10:49         ` Thomas Glanzmann
  0 siblings, 1 reply; 14+ messages in thread
From: Julien BLACHE @ 2007-02-02 10:44 UTC (permalink / raw)
  To: Thomas Glanzmann; +Cc: Stephen Hemminger, Fagyal Csongor, linux-kernel, netdev

Thomas Glanzmann <thomas@glanzmann.de> wrote:

Hi,

>> I can reproduce the problem now (on mac mini). Interestingly it seems
>> to whack the whole ethernet switch when it happens.

I've observed that too, on a cheap DLink switch.

Next time sky2 hangs on me I'll try to reset the PHY and see if that
helps. I can usually trigger the hang by doing a couple of ifconfig
up/down on the interface, though I'm not getting any error message
from the driver when that happens.

JB.

-- 
Julien BLACHE                                   <http://www.jblache.org> 
<jb@jblache.org>                                  GPG KeyID 0xF5D65169

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sky2 hangs
  2007-02-02 10:44       ` Julien BLACHE
@ 2007-02-02 10:49         ` Thomas Glanzmann
  2007-02-02 11:53           ` Fagyal Csongor
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Glanzmann @ 2007-02-02 10:49 UTC (permalink / raw)
  To: Julien BLACHE; +Cc: Stephen Hemminger, Fagyal Csongor, linux-kernel, netdev

Hello,

> Next time sky2 hangs on me I'll try to reset the PHY and see if that
> helps. I can usually trigger the hang by doing a couple of ifconfig
> up/down on the interface, though I'm not getting any error message
> from the driver when that happens.

same for me. In dmesg is absolut nothing. I change my fix script, too.
To see if that is enough to resolv the problem.

        Thomas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sky2 hangs
  2007-02-02 10:49         ` Thomas Glanzmann
@ 2007-02-02 11:53           ` Fagyal Csongor
  2007-02-02 13:43             ` Jarek Poplawski
  0 siblings, 1 reply; 14+ messages in thread
From: Fagyal Csongor @ 2007-02-02 11:53 UTC (permalink / raw)
  To: Thomas Glanzmann; +Cc: Julien BLACHE, Stephen Hemminger, linux-kernel, netdev

Thomas Glanzmann wrote:

>Hello,
>
>  
>
>>Next time sky2 hangs on me I'll try to reset the PHY and see if that
>>helps. I can usually trigger the hang by doing a couple of ifconfig
>>up/down on the interface, though I'm not getting any error message
>>from the driver when that happens.
>>    
>>
>
>same for me. In dmesg is absolut nothing. I change my fix script, too.
>To see if that is enough to resolv the problem.
>  
>
Well, ethtool -r eth0 did not work for me. :(

This time I got nothing in the log.

When I say ethtool -r eth0, I have this:
sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both

But the interface stays down. (Maybe the other end got confused?)


- Cs.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sky2 hangs
  2007-02-02 11:53           ` Fagyal Csongor
@ 2007-02-02 13:43             ` Jarek Poplawski
  2007-02-02 14:13               ` Jarek Poplawski
  0 siblings, 1 reply; 14+ messages in thread
From: Jarek Poplawski @ 2007-02-02 13:43 UTC (permalink / raw)
  To: Fagyal Csongor
  Cc: Thomas Glanzmann, Julien BLACHE, Stephen Hemminger, linux-kernel, netdev

On 02-02-2007 12:53, Fagyal Csongor wrote:
> Thomas Glanzmann wrote:
...
>>> Next time sky2 hangs on me I'll try to reset the PHY and see if that
>>> helps. I can usually trigger the hang by doing a couple of ifconfig
>>> up/down on the interface, though I'm not getting any error message
>>> from the driver when that happens.
>>>   
>>
>> same for me. In dmesg is absolut nothing. I change my fix script, too.
>> To see if that is enough to resolv the problem.
>>  
>>
> Well, ethtool -r eth0 did not work for me. :(
> 
> This time I got nothing in the log.
> 
> When I say ethtool -r eth0, I have this:
> sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both
> 
> But the interface stays down. (Maybe the other end got confused?)

Hi,

Is this with this yesterday sky2-tx-recover.patch applied?

Jarek P.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sky2 hangs
  2007-02-02 13:43             ` Jarek Poplawski
@ 2007-02-02 14:13               ` Jarek Poplawski
  0 siblings, 0 replies; 14+ messages in thread
From: Jarek Poplawski @ 2007-02-02 14:13 UTC (permalink / raw)
  To: Fagyal Csongor
  Cc: Thomas Glanzmann, Julien BLACHE, Stephen Hemminger, linux-kernel, netdev

On Fri, Feb 02, 2007 at 02:43:11PM +0100, Jarek Poplawski wrote:
> On 02-02-2007 12:53, Fagyal Csongor wrote:
> > Thomas Glanzmann wrote:
...
> Is this with this yesterday sky2-tx-recover.patch applied?

I mean hung-ups - not ethtool.

Regards,
Jarek P.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-02-02 14:10 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-01 18:55 sky2 hangs Thomas Glanzmann
2007-02-01 19:05 ` Stephen Hemminger
2007-02-01 19:19   ` Thomas Glanzmann
2007-02-01 19:07 ` Stephen Hemminger
2007-02-01 19:16   ` Thomas Glanzmann
2007-02-01 22:46 ` Fagyal Csongor
2007-02-01 22:58   ` Stephen Hemminger
2007-02-02  6:31     ` Thomas Glanzmann
2007-02-02 10:44       ` Julien BLACHE
2007-02-02 10:49         ` Thomas Glanzmann
2007-02-02 11:53           ` Fagyal Csongor
2007-02-02 13:43             ` Jarek Poplawski
2007-02-02 14:13               ` Jarek Poplawski
2007-02-02  6:27   ` Thomas Glanzmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).