LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
@ 2007-08-15 13:31 Florin Iucha
  2007-08-15 14:38 ` [linux-usb-devel] " Alan Stern
  2007-08-15 14:49 ` Jiri Kosina
  0 siblings, 2 replies; 35+ messages in thread
From: Florin Iucha @ 2007-08-15 13:31 UTC (permalink / raw)
  To: linux-usb-devel, Linux Kernel Mailing List; +Cc: Michal Piotrowski, Greg KH

[-- Attachment #1: Type: text/plain, Size: 5959 bytes --]

Today my USB keyboard stopped working in the middle of composing and
e-mail.  I unplugged it and plugged it back, with no success.  I
logged in remotely and found this lovely message:

[ 1301.567351] usb 1-4: USB disconnect, address 3
[ 1301.567356] usb 1-4.2: USB disconnect, address 5
[ 1301.567599] sysfs_remove_bin_file: bad dentry or inode or no such file: "descriptors"
[ 1301.567604] 
[ 1301.567605] Call Trace:
[ 1301.567614]  [<ffffffff802b89a6>] sysfs_remove_bin_file+0x39/0x3d
[ 1301.567619]  [<ffffffff803f1d24>] device_remove_bin_file+0x15/0x17
[ 1301.567623]  [<ffffffff8045ea0d>] usb_remove_sysfs_dev_files+0x89/0x9d
[ 1301.567627]  [<ffffffff804625cc>] generic_disconnect+0x2e/0x32
[ 1301.567630]  [<ffffffff8045b58d>] usb_unbind_device+0x15/0x19
[ 1301.567634]  [<ffffffff803f4161>] __device_release_driver+0x93/0xb3
[ 1301.567637]  [<ffffffff803f45af>] device_release_driver+0x31/0x49
[ 1301.567640]  [<ffffffff803f39f1>] bus_remove_device+0x76/0x87
[ 1301.567644]  [<ffffffff803f2014>] device_del+0x216/0x297
[ 1301.567648]  [<ffffffff80455c8f>] usb_disconnect+0xc8/0x151
[ 1301.567651]  [<ffffffff80455c56>] usb_disconnect+0x8f/0x151
[ 1301.567655]  [<ffffffff804564c8>] hub_thread+0x442/0xc47
[ 1301.567659]  [<ffffffff80553cdb>] _spin_unlock_irq+0x9/0xc
[ 1301.567664]  [<ffffffff80244ad7>] autoremove_wake_function+0x0/0x38
[ 1301.567668]  [<ffffffff80456086>] hub_thread+0x0/0xc47
[ 1301.567671]  [<ffffffff802449cb>] kthread+0x49/0x76
[ 1301.567674]  [<ffffffff8020c618>] child_rip+0xa/0x12
[ 1301.567679]  [<ffffffff80244982>] kthread+0x0/0x76
[ 1301.567682]  [<ffffffff8020c60e>] child_rip+0x0/0x12
[ 1301.567684] 

I have rebooted, and while composing this message, I thought useful to
include the output from 'lsusb'.  Funny enough, lsusb does not list any
devices.  A 'cat /proc/bus/usb/devices' yields the following:

T:  Bus=02 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12  MxCh=10
B:  Alloc= 25/900 us ( 3%), #Int=  5, #Iso=  0
D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=0000 ProdID=0000 Rev= 2.06
S:  Manufacturer=Linux 2.6.23-rc3-1 ohci_hcd
S:  Product=OHCI Host Controller
S:  SerialNumber=0000:00:02.0
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms

T:  Bus=02 Lev=01 Prnt=01 Port=02 Cnt=01 Dev#=  2 Spd=12  MxCh= 4
D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
P:  Vendor=05f3 ProdID=0081 Rev= 3.10
S:  Manufacturer=PI Engineering
S:  Product=Kinesis Keyboard Hub
C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr= 50mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   1 Ivl=255ms

T:  Bus=02 Lev=02 Prnt=02 Port=01 Cnt=01 Dev#=  3 Spd=12  MxCh= 0
D:  Ver= 1.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
P:  Vendor=05f3 ProdID=0007 Rev= 3.10
C:* #Ifs= 2 Cfg#= 1 Atr=a0 MxPwr= 64mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=01 Prot=01 Driver=usbhid
E:  Ad=81(I) Atr=03(Int.) MxPS=   8 Ivl=8ms
I:* If#= 1 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=00 Prot=00 Driver=usbhid
E:  Ad=82(I) Atr=03(Int.) MxPS=   4 Ivl=8ms

T:  Bus=02 Lev=02 Prnt=02 Port=02 Cnt=02 Dev#=  4 Spd=1.5 MxCh= 0
D:  Ver= 1.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
P:  Vendor=05bc ProdID=0102 Rev= 2.00
S:  Manufacturer=Forward
S:  Product=USB Optical Mouse
C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=01 Prot=02 Driver=usbhid
E:  Ad=81(I) Atr=03(Int.) MxPS=   4 Ivl=10ms

T:  Bus=02 Lev=02 Prnt=02 Port=03 Cnt=03 Dev#=  5 Spd=1.5 MxCh= 0
D:  Ver= 1.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
P:  Vendor=045e ProdID=0039 Rev= 1.21
S:  Manufacturer=Microsoft
S:  Product=Microsoft IntelliMouse® Optical
C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=01 Prot=02 Driver=usbhid
E:  Ad=81(I) Atr=03(Int.) MxPS=   4 Ivl=10ms

T:  Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=480 MxCh=10
B:  Alloc=  0/800 us ( 0%), #Int=  1, #Iso=  0
D:  Ver= 2.00 Cls=09(hub  ) Sub=00 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=0000 ProdID=0000 Rev= 2.06
S:  Manufacturer=Linux 2.6.23-rc3-1 ehci_hcd
S:  Product=EHCI Host Controller
S:  SerialNumber=0000:00:02.1
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   4 Ivl=256ms

T:  Bus=01 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#=  3 Spd=480 MxCh= 4
D:  Ver= 2.00 Cls=09(hub  ) Sub=00 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=0409 ProdID=0058 Rev= 1.00
S:  Manufacturer=NEC Corporation
S:  Product=USB2.0 Hub Controller
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   1 Ivl=256ms

T:  Bus=01 Lev=02 Prnt=03 Port=01 Cnt=01 Dev#=  5 Spd=1.5 MxCh= 0
D:  Ver= 1.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
P:  Vendor=0fe9 ProdID=9010 Rev= 1.00
S:  Manufacturer=DVICO
S:  Product=DVICO USB HID Remocon V1.00
C:* #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=03(HID  ) Sub=00 Prot=00 Driver=usbhid
E:  Ad=81(I) Atr=03(Int.) MxPS=   3 Ivl=32ms
E:  Ad=02(O) Atr=03(Int.) MxPS=   1 Ivl=32ms

T:  Bus=01 Lev=01 Prnt=01 Port=08 Cnt=02 Dev#=  4 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=07cc ProdID=0501 Rev=91.44
S:  Manufacturer=USB2.0
S:  Product=CardReader
S:  SerialNumber=1234609 
C:* #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms

I am testing each rcX kernel, and I did not see this problem so far.
Smells like a new regression.

florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-15 13:31 USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351 Florin Iucha
@ 2007-08-15 14:38 ` Alan Stern
  2007-08-15 14:50   ` Florin Iucha
  2007-08-15 14:54   ` Tejun Heo
  2007-08-15 14:49 ` Jiri Kosina
  1 sibling, 2 replies; 35+ messages in thread
From: Alan Stern @ 2007-08-15 14:38 UTC (permalink / raw)
  To: Florin Iucha
  Cc: Tejun Heo, USB development list, Linux Kernel Mailing List,
	Greg KH, Michal Piotrowski

On Wed, 15 Aug 2007, Florin Iucha wrote:

> Today my USB keyboard stopped working in the middle of composing and
> e-mail.  I unplugged it and plugged it back, with no success.  I
> logged in remotely and found this lovely message:
> 
> [ 1301.567351] usb 1-4: USB disconnect, address 3
> [ 1301.567356] usb 1-4.2: USB disconnect, address 5
> [ 1301.567599] sysfs_remove_bin_file: bad dentry or inode or no such file: "descriptors"
> [ 1301.567604] 
> [ 1301.567605] Call Trace:
> [ 1301.567614]  [<ffffffff802b89a6>] sysfs_remove_bin_file+0x39/0x3d
> [ 1301.567619]  [<ffffffff803f1d24>] device_remove_bin_file+0x15/0x17
> [ 1301.567623]  [<ffffffff8045ea0d>] usb_remove_sysfs_dev_files+0x89/0x9d
> [ 1301.567627]  [<ffffffff804625cc>] generic_disconnect+0x2e/0x32
> [ 1301.567630]  [<ffffffff8045b58d>] usb_unbind_device+0x15/0x19
> [ 1301.567634]  [<ffffffff803f4161>] __device_release_driver+0x93/0xb3
> [ 1301.567637]  [<ffffffff803f45af>] device_release_driver+0x31/0x49
> [ 1301.567640]  [<ffffffff803f39f1>] bus_remove_device+0x76/0x87
> [ 1301.567644]  [<ffffffff803f2014>] device_del+0x216/0x297
> [ 1301.567648]  [<ffffffff80455c8f>] usb_disconnect+0xc8/0x151
> [ 1301.567651]  [<ffffffff80455c56>] usb_disconnect+0x8f/0x151
> [ 1301.567655]  [<ffffffff804564c8>] hub_thread+0x442/0xc47
> [ 1301.567659]  [<ffffffff80553cdb>] _spin_unlock_irq+0x9/0xc
> [ 1301.567664]  [<ffffffff80244ad7>] autoremove_wake_function+0x0/0x38
> [ 1301.567668]  [<ffffffff80456086>] hub_thread+0x0/0xc47
> [ 1301.567671]  [<ffffffff802449cb>] kthread+0x49/0x76
> [ 1301.567674]  [<ffffffff8020c618>] child_rip+0xa/0x12
> [ 1301.567679]  [<ffffffff80244982>] kthread+0x0/0x76
> [ 1301.567682]  [<ffffffff8020c60e>] child_rip+0x0/0x12

I think we can simply remove the error message.  There's no obvious 
reason why sysfs_remove_bin_file() should complain about attempts to 
remove a nonexistent file; sysfs_remove_file() doesn't.

This patch will get rid of the annoying error messages.  It won't do 
anything about your keyboard's tendency to spontaneously stop working, 
alas.

Alan Stern


Index: usb-2.6/fs/sysfs/bin.c
===================================================================
--- usb-2.6.orig/fs/sysfs/bin.c
+++ usb-2.6/fs/sysfs/bin.c
@@ -248,12 +248,7 @@ int sysfs_create_bin_file(struct kobject
 
 void sysfs_remove_bin_file(struct kobject * kobj, struct bin_attribute * attr)
 {
-	if (sysfs_hash_and_remove(kobj->sd, attr->attr.name) < 0) {
-		printk(KERN_ERR "%s: "
-			"bad dentry or inode or no such file: \"%s\"\n",
-			__FUNCTION__, attr->attr.name);
-		dump_stack();
-	}
+	sysfs_hash_and_remove(kobj->sd, attr->attr.name);
 }
 
 EXPORT_SYMBOL_GPL(sysfs_create_bin_file);


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-15 13:31 USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351 Florin Iucha
  2007-08-15 14:38 ` [linux-usb-devel] " Alan Stern
@ 2007-08-15 14:49 ` Jiri Kosina
  2007-08-15 14:53   ` Florin Iucha
  1 sibling, 1 reply; 35+ messages in thread
From: Jiri Kosina @ 2007-08-15 14:49 UTC (permalink / raw)
  To: Florin Iucha
  Cc: linux-usb-devel, Linux Kernel Mailing List, Greg KH, Michal Piotrowski

On Wed, 15 Aug 2007, Florin Iucha wrote:

> Today my USB keyboard stopped working in the middle of composing and
> e-mail.  I unplugged it and plugged it back, with no success.  I
> logged in remotely and found this lovely message:

The error message seems unrelated to your keyboard becoming dead.

> I am testing each rcX kernel, and I did not see this problem so far. 
> Smells like a new regression.

Is that reproducible, or did it happen just once? Any error message 
present in log prior to that sysfs dump please?

Thanks,

-- 
Jiri Kosina

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-15 14:38 ` [linux-usb-devel] " Alan Stern
@ 2007-08-15 14:50   ` Florin Iucha
  2007-08-15 15:24     ` Alan Stern
  2007-08-15 14:54   ` Tejun Heo
  1 sibling, 1 reply; 35+ messages in thread
From: Florin Iucha @ 2007-08-15 14:50 UTC (permalink / raw)
  To: Alan Stern
  Cc: Tejun Heo, USB development list, Linux Kernel Mailing List,
	Greg KH, Michal Piotrowski

[-- Attachment #1: Type: text/plain, Size: 931 bytes --]

On Wed, Aug 15, 2007 at 10:38:54AM -0400, Alan Stern wrote:
> This patch will get rid of the annoying error messages.  It won't do 
> anything about your keyboard's tendency to spontaneously stop working, 
> alas.

My keyboard works fine for days, with kernels up to and including
2.6.23-rc2 . I have booted into 2.6.23-rc3-$whatever this morning, and
after 10-15 minutes the keyboard stopped working.  The mice which were
plugged in the keyboard's built-in hub were fine though.

The first time it happened, I removed the keyboard and got the oops
that started this thread.  The second time, I just logged-in remotely
and rebooted, and the reboot process stopped at "KILLING all
processes" step.  I simply reset the box and rebooted into 2.6.23-rc2
and it is fine since (over an hour ago).

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-15 14:49 ` Jiri Kosina
@ 2007-08-15 14:53   ` Florin Iucha
  2007-08-15 14:58     ` Jiri Kosina
  0 siblings, 1 reply; 35+ messages in thread
From: Florin Iucha @ 2007-08-15 14:53 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: linux-usb-devel, Linux Kernel Mailing List, Greg KH, Michal Piotrowski

[-- Attachment #1: Type: text/plain, Size: 1085 bytes --]

On Wed, Aug 15, 2007 at 04:49:02PM +0200, Jiri Kosina wrote:
> On Wed, 15 Aug 2007, Florin Iucha wrote:
> 
> > Today my USB keyboard stopped working in the middle of composing and
> > e-mail.  I unplugged it and plugged it back, with no success.  I
> > logged in remotely and found this lovely message:
> 
> The error message seems unrelated to your keyboard becoming dead.

Yes, it was related to me unplugging it in the hopes that a re-plug
will make it work again ;)

> > I am testing each rcX kernel, and I did not see this problem so far. 
> > Smells like a new regression.
> 
> Is that reproducible, or did it happen just once? Any error message 
> present in log prior to that sysfs dump please?

[See my message to Alan]: It happened twice, within 15 minutes of
boot+login, with 2.6.23-rc3-$whatever .  I does not happen with
2.6.2[123](-rc*)?  After the two incidents, I rebooted in 2.6.23-rc2
and it is working for an hour now.

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-15 14:38 ` [linux-usb-devel] " Alan Stern
  2007-08-15 14:50   ` Florin Iucha
@ 2007-08-15 14:54   ` Tejun Heo
  2007-08-15 15:21     ` Cornelia Huck
  2007-08-15 15:33     ` Alan Stern
  1 sibling, 2 replies; 35+ messages in thread
From: Tejun Heo @ 2007-08-15 14:54 UTC (permalink / raw)
  To: Alan Stern
  Cc: Florin Iucha, USB development list, Linux Kernel Mailing List,
	Greg KH, Michal Piotrowski

Alan Stern wrote:
> I think we can simply remove the error message.  There's no obvious 
> reason why sysfs_remove_bin_file() should complain about attempts to 
> remove a nonexistent file; sysfs_remove_file() doesn't.
> 
> This patch will get rid of the annoying error messages.  It won't do 
> anything about your keyboard's tendency to spontaneously stop working, 
> alas.

Agreed but I think sysfs_remove_bin_file() should relay the return value
from sysfs_has_and_remove() to the caller.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-15 14:53   ` Florin Iucha
@ 2007-08-15 14:58     ` Jiri Kosina
  2007-08-21 11:51       ` Florin Iucha
  0 siblings, 1 reply; 35+ messages in thread
From: Jiri Kosina @ 2007-08-15 14:58 UTC (permalink / raw)
  To: Florin Iucha
  Cc: linux-usb-devel, Linux Kernel Mailing List, Greg KH, Michal Piotrowski

On Wed, 15 Aug 2007, Florin Iucha wrote:

> [See my message to Alan]: It happened twice, within 15 minutes of 
> boot+login, with 2.6.23-rc3-$whatever .  I does not happen with 
> 2.6.2[123](-rc*)?  After the two incidents, I rebooted in 2.6.23-rc2 and 
> it is working for an hour now.

It is not immediately clear what might be causing this, 2.6.23-rc3 didn't 
get any USB nor HID updates at all compared to 2.6.23-rc2.

Could you please enable USB and HID debugging to see whether we can see 
anything spurious in the logs at the time the keyboard gets stuck?

Bisecting this might be a bit painful if it is not reproducible in 
predictable timeframes :(

-- 
Jiri Kosina

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-15 14:54   ` Tejun Heo
@ 2007-08-15 15:21     ` Cornelia Huck
  2007-08-15 15:30       ` Tejun Heo
  2007-08-15 15:33     ` Alan Stern
  1 sibling, 1 reply; 35+ messages in thread
From: Cornelia Huck @ 2007-08-15 15:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alan Stern, Florin Iucha, USB development list,
	Linux Kernel Mailing List, Greg KH, Michal Piotrowski,
	Randy. Dunlap

On Wed, 15 Aug 2007 23:54:43 +0900,
Tejun Heo <htejun@gmail.com> wrote:

> Alan Stern wrote:
> > I think we can simply remove the error message.  There's no obvious 
> > reason why sysfs_remove_bin_file() should complain about attempts to 
> > remove a nonexistent file; sysfs_remove_file() doesn't.
> > 
> > This patch will get rid of the annoying error messages.  It won't do 
> > anything about your keyboard's tendency to spontaneously stop working, 
> > alas.
> 
> Agreed but I think sysfs_remove_bin_file() should relay the return value
> from sysfs_has_and_remove() to the caller.

Three comments:

- Randy made sysfs_remove_bin_file() return void in commit
995982ca79d9262869513948ec7c540f32035491.

- For symmetry reasons, sysfs_remove_file() should then also pass the
return value on.

- I'm not sure who wants to care whether they removed an existing or
non-existing file. But maybe I'm just unimaginative.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-15 14:50   ` Florin Iucha
@ 2007-08-15 15:24     ` Alan Stern
  0 siblings, 0 replies; 35+ messages in thread
From: Alan Stern @ 2007-08-15 15:24 UTC (permalink / raw)
  To: Florin Iucha
  Cc: Michal Piotrowski, Tejun Heo, Greg KH, USB development list,
	Linux Kernel Mailing List

On Wed, 15 Aug 2007, Florin Iucha wrote:

> On Wed, Aug 15, 2007 at 10:38:54AM -0400, Alan Stern wrote:
> > This patch will get rid of the annoying error messages.  It won't do 
> > anything about your keyboard's tendency to spontaneously stop working, 
> > alas.
> 
> My keyboard works fine for days, with kernels up to and including
> 2.6.23-rc2 . I have booted into 2.6.23-rc3-$whatever this morning, and
> after 10-15 minutes the keyboard stopped working.  The mice which were
> plugged in the keyboard's built-in hub were fine though.
> 
> The first time it happened, I removed the keyboard and got the oops
> that started this thread.

It wasn't an oops, just a warning.

>  The second time, I just logged-in remotely
> and rebooted, and the reboot process stopped at "KILLING all
> processes" step.  I simply reset the box and rebooted into 2.6.23-rc2
> and it is fine since (over an hour ago).

To track this down, you might try building 2.6.23-rc3 with 
CONFIG_USB_DEBUG enabled.  Then retrieve the dmesg log after the 
keyboard stops working and post it.  You probably ought to CC: the 
maintainer of the HID core layer as well (and you can trim the existing 
CC: list).

Alan Stern


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-15 15:21     ` Cornelia Huck
@ 2007-08-15 15:30       ` Tejun Heo
  0 siblings, 0 replies; 35+ messages in thread
From: Tejun Heo @ 2007-08-15 15:30 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Alan Stern, Florin Iucha, USB development list,
	Linux Kernel Mailing List, Greg KH, Michal Piotrowski,
	Randy. Dunlap

Cornelia Huck wrote:
> On Wed, 15 Aug 2007 23:54:43 +0900,
> Tejun Heo <htejun@gmail.com> wrote:
> 
>> Alan Stern wrote:
>>> I think we can simply remove the error message.  There's no obvious 
>>> reason why sysfs_remove_bin_file() should complain about attempts to 
>>> remove a nonexistent file; sysfs_remove_file() doesn't.
>>>
>>> This patch will get rid of the annoying error messages.  It won't do 
>>> anything about your keyboard's tendency to spontaneously stop working, 
>>> alas.
>> Agreed but I think sysfs_remove_bin_file() should relay the return value
>> from sysfs_has_and_remove() to the caller.
> 
> Three comments:
> 
> - Randy made sysfs_remove_bin_file() return void in commit
> 995982ca79d9262869513948ec7c540f32035491.
> 
> - For symmetry reasons, sysfs_remove_file() should then also pass the
> return value on.
> 
> - I'm not sure who wants to care whether they removed an existing or
> non-existing file. But maybe I'm just unimaginative.

Hmmm... Well, failure information is lost there, so I was a bit worried.
 It probably doesn't really matter and can be easily changed later if
needed.  If sysfs_remove_file() returns void, I have no objection.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-15 14:54   ` Tejun Heo
  2007-08-15 15:21     ` Cornelia Huck
@ 2007-08-15 15:33     ` Alan Stern
  1 sibling, 0 replies; 35+ messages in thread
From: Alan Stern @ 2007-08-15 15:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Michal Piotrowski, Florin Iucha, Greg KH, USB development list,
	Linux Kernel Mailing List

On Wed, 15 Aug 2007, Tejun Heo wrote:

> Alan Stern wrote:
> > I think we can simply remove the error message.  There's no obvious 
> > reason why sysfs_remove_bin_file() should complain about attempts to 
> > remove a nonexistent file; sysfs_remove_file() doesn't.
> > 
> > This patch will get rid of the annoying error messages.  It won't do 
> > anything about your keyboard's tendency to spontaneously stop working, 
> > alas.
> 
> Agreed but I think sysfs_remove_bin_file() should relay the return value
> from sysfs_has_and_remove() to the caller.

Perhaps.  But none of

	sysfs_remove_one()
	sysfs_remove_subdir()
	sysfs_remove_dir()
	sysfs_remove_file()
	sysfs_remove_file_from_group()
	sysfs_remove_group()
	sysfs_remove_link()

return a value.  Why should sysfs_remove_bin_file() be different?  And 
what callers would pay attention to the return value?

Alan Stern


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-15 14:58     ` Jiri Kosina
@ 2007-08-21 11:51       ` Florin Iucha
  2007-08-21 12:04         ` Jiri Kosina
                           ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Florin Iucha @ 2007-08-21 11:51 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: linux-usb-devel, Linux Kernel Mailing List, Greg KH, Michal Piotrowski

[-- Attachment #1: Type: text/plain, Size: 1555 bytes --]

On Wed, Aug 15, 2007 at 04:58:33PM +0200, Jiri Kosina wrote:
> On Wed, 15 Aug 2007, Florin Iucha wrote:
> 
> > [See my message to Alan]: It happened twice, within 15 minutes of 
> > boot+login, with 2.6.23-rc3-$whatever .  I does not happen with 
> > 2.6.2[123](-rc*)?  After the two incidents, I rebooted in 2.6.23-rc2 and 
> > it is working for an hour now.
> 
> It is not immediately clear what might be causing this, 2.6.23-rc3 didn't 
> get any USB nor HID updates at all compared to 2.6.23-rc2.
> 
> Could you please enable USB and HID debugging to see whether we can see 
> anything spurious in the logs at the time the keyboard gets stuck?

Jiri,

I have enabled USB debugging and I see a bunch (=46) of these messages:

   [  $timestamp] usb 1-9: usb auto-suspend
   [  $timestamp] usb 1-9: usb auto-resume
   [  $timestamp] ehci_hcd 0000:00:02.1: GetStatus port 9 status 001005 POWER sig=se0 PE CONNECT
   [  $timestamp] usb 1-9: finish resume

The messages continued to be logged, even after the keyboard has
become unresponsive.

The entire kernel log is at http://iucha.net/usb/log-2.6.23-rc3-2 .
The dump of /proc/bus/usb/devices is at http://iucha.net/usb/devices .
The output of 'lsusb -t' is at http://iucha.net/usb/lsusb-t .  Plain
lsusb is not working.  The version of usbutils is '0.72-7ubuntu2' .

Do you need me to build a -rc2 with USB debug enabled to compare and
contrast?

Thanks,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-21 11:51       ` Florin Iucha
@ 2007-08-21 12:04         ` Jiri Kosina
  2007-08-21 12:28           ` Florin Iucha
  2007-08-21 14:51           ` Alan Stern
  2007-08-21 12:06         ` Oliver Neukum
  2007-08-21 12:57         ` Florin Iucha
  2 siblings, 2 replies; 35+ messages in thread
From: Jiri Kosina @ 2007-08-21 12:04 UTC (permalink / raw)
  To: Florin Iucha
  Cc: linux-usb-devel, Linux Kernel Mailing List, Greg KH, Michal Piotrowski

On Tue, 21 Aug 2007, Florin Iucha wrote:

> I have enabled USB debugging and I see a bunch (=46) of these messages:

>    [  $timestamp] usb 1-9: usb auto-suspend
>    [  $timestamp] usb 1-9: usb auto-resume
>    [  $timestamp] ehci_hcd 0000:00:02.1: GetStatus port 9 status 001005 POWER sig=se0 PE CONNECT
>    [  $timestamp] usb 1-9: finish resume
> The messages continued to be logged, even after the keyboard has
> become unresponsive.

I guess that this is the card reader being suspended and resumed 
afterwards. Do you by any chance see any improvement when you

- rmmod ehci_hcd
- disable USB_AUTOSUSPEND

please? Thanks,

-- 
Jiri Kosina

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-21 11:51       ` Florin Iucha
  2007-08-21 12:04         ` Jiri Kosina
@ 2007-08-21 12:06         ` Oliver Neukum
  2007-08-21 12:09           ` Jiri Kosina
  2007-08-21 12:19           ` Oliver Neukum
  2007-08-21 12:57         ` Florin Iucha
  2 siblings, 2 replies; 35+ messages in thread
From: Oliver Neukum @ 2007-08-21 12:06 UTC (permalink / raw)
  To: linux-usb-devel, Greg KH
  Cc: Florin Iucha, Jiri Kosina, Michal Piotrowski, Linux Kernel Mailing List

Am Dienstag 21 August 2007 schrieb Florin Iucha:
> On Wed, Aug 15, 2007 at 04:58:33PM +0200, Jiri Kosina wrote:
> > On Wed, 15 Aug 2007, Florin Iucha wrote:
> > 
> > > [See my message to Alan]: It happened twice, within 15 minutes of 
> > > boot+login, with 2.6.23-rc3-$whatever .  I does not happen with 
> > > 2.6.2[123](-rc*)?  After the two incidents, I rebooted in 2.6.23-rc2 and 
> > > it is working for an hour now.
> > 
> > It is not immediately clear what might be causing this, 2.6.23-rc3 didn't 
> > get any USB nor HID updates at all compared to 2.6.23-rc2.
> > 
> > Could you please enable USB and HID debugging to see whether we can see 
> > anything spurious in the logs at the time the keyboard gets stuck?
> 
> Jiri,
> 
> I have enabled USB debugging and I see a bunch (=46) of these messages:
> 
>    [  $timestamp] usb 1-9: usb auto-suspend
>    [  $timestamp] usb 1-9: usb auto-resume
>    [  $timestamp] ehci_hcd 0000:00:02.1: GetStatus port 9 status 001005 POWER sig=se0 PE CONNECT
>    [  $timestamp] usb 1-9: finish resume
> 
> The messages continued to be logged, even after the keyboard has
> become unresponsive.

[   60.756730] usb 1-9: usb auto-resume
Did you hit a key at that time?


It looks like your keyboard gets autosuspended. But how can that happen?
Keyboards should never autosuspend, as they are always open.
The patch for minimum autosuspend support in HID did get in earlier,
didn't it?

	Regards
		Oliver



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-21 12:06         ` Oliver Neukum
@ 2007-08-21 12:09           ` Jiri Kosina
  2007-08-21 12:19           ` Oliver Neukum
  1 sibling, 0 replies; 35+ messages in thread
From: Jiri Kosina @ 2007-08-21 12:09 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: linux-usb-devel, Greg KH, Florin Iucha, Michal Piotrowski,
	Linux Kernel Mailing List

On Tue, 21 Aug 2007, Oliver Neukum wrote:

> It looks like your keyboard gets autosuspended. But how can that happen? 
> Keyboards should never autosuspend, as they are always open. The patch 
> for minimum autosuspend support in HID did get in earlier, didn't it?

Hi Oliver,

it actually even is not in mainline, it's queued in my tree for the next 
merge window.

-- 
Jiri Kosina

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-21 12:06         ` Oliver Neukum
  2007-08-21 12:09           ` Jiri Kosina
@ 2007-08-21 12:19           ` Oliver Neukum
  1 sibling, 0 replies; 35+ messages in thread
From: Oliver Neukum @ 2007-08-21 12:19 UTC (permalink / raw)
  To: linux-usb-devel
  Cc: Greg KH, Linux Kernel Mailing List, Florin Iucha, Michal Piotrowski

Am Dienstag 21 August 2007 schrieb Oliver Neukum:
> [   60.756730] usb 1-9: usb auto-resume
> Did you hit a key at that time?
> 
> 
> It looks like your keyboard gets autosuspended. But how can that happen?
> Keyboards should never autosuspend, as they are always open.
> The patch for minimum autosuspend support in HID did get in earlier,
> didn't it?
> 

Sorry disregard the question, I mistook your devices.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-21 12:04         ` Jiri Kosina
@ 2007-08-21 12:28           ` Florin Iucha
  2007-08-21 14:51           ` Alan Stern
  1 sibling, 0 replies; 35+ messages in thread
From: Florin Iucha @ 2007-08-21 12:28 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: linux-usb-devel, Linux Kernel Mailing List, Greg KH, Michal Piotrowski

[-- Attachment #1: Type: text/plain, Size: 1116 bytes --]

On Tue, Aug 21, 2007 at 02:04:26PM +0200, Jiri Kosina wrote:
> > I have enabled USB debugging and I see a bunch (=46) of these messages:
> 
> >    [  $timestamp] usb 1-9: usb auto-suspend
> >    [  $timestamp] usb 1-9: usb auto-resume
> >    [  $timestamp] ehci_hcd 0000:00:02.1: GetStatus port 9 status 001005 POWER sig=se0 PE CONNECT
> >    [  $timestamp] usb 1-9: finish resume
> > The messages continued to be logged, even after the keyboard has
> > become unresponsive.
> 
> I guess that this is the card reader being suspended and resumed 
> afterwards. Do you by any chance see any improvement when you
> 
> - rmmod ehci_hcd

It's built-in.  Should I build it as a module?  This machine has only
usb 2.0 ports.  If I rmmod it, will my USB keyboard still work?

[The card reader is one of those that fit into a 3.5" bay, connected
straight to the motherboard controller, so it's a bit of a pain to
disconnect.]

> - disable USB_AUTOSUSPEND

You mean CONFIG_USB_SUSPEND?

florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-21 11:51       ` Florin Iucha
  2007-08-21 12:04         ` Jiri Kosina
  2007-08-21 12:06         ` Oliver Neukum
@ 2007-08-21 12:57         ` Florin Iucha
  2007-08-21 13:05           ` Jiri Kosina
  2 siblings, 1 reply; 35+ messages in thread
From: Florin Iucha @ 2007-08-21 12:57 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: linux-usb-devel, Linux Kernel Mailing List, Greg KH, Michal Piotrowski

[-- Attachment #1: Type: text/plain, Size: 1670 bytes --]

On Tue, Aug 21, 2007 at 06:51:15AM -0500, Florin Iucha wrote:
> On Wed, Aug 15, 2007 at 04:58:33PM +0200, Jiri Kosina wrote:
> > On Wed, 15 Aug 2007, Florin Iucha wrote:
> > 
> > > [See my message to Alan]: It happened twice, within 15 minutes of 
> > > boot+login, with 2.6.23-rc3-$whatever .  I does not happen with 
> > > 2.6.2[123](-rc*)?  After the two incidents, I rebooted in 2.6.23-rc2 and 
> > > it is working for an hour now.
> > 
> > It is not immediately clear what might be causing this, 2.6.23-rc3 didn't 
> > get any USB nor HID updates at all compared to 2.6.23-rc2.
> > 
> > Could you please enable USB and HID debugging to see whether we can see 
> > anything spurious in the logs at the time the keyboard gets stuck?
> 
> Jiri,
> 
> I have enabled USB debugging and I see a bunch (=46) of these messages:
> 
>    [  $timestamp] usb 1-9: usb auto-suspend
>    [  $timestamp] usb 1-9: usb auto-resume
>    [  $timestamp] ehci_hcd 0000:00:02.1: GetStatus port 9 status 001005 POWER sig=se0 PE CONNECT
>    [  $timestamp] usb 1-9: finish resume
> 
> The messages continued to be logged, even after the keyboard has
> become unresponsive.

[snip]

> Do you need me to build a -rc2 with USB debug enabled to compare and
> contrast?

With 2.6.23-rc2 and USB_DEBUG enabled, I see the same messages but no
keyboard "dissapearance".

I have rebuilt 2.6.23-rc3 with 'CONFIG_USB_EHCI_HCD=m' and
'CONFIG_USB_SUSPEND is not set' and will use it for a while, to see if 
the keyboard/usb behaves or not.

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-21 12:57         ` Florin Iucha
@ 2007-08-21 13:05           ` Jiri Kosina
  2007-08-21 13:17             ` Florin Iucha
  0 siblings, 1 reply; 35+ messages in thread
From: Jiri Kosina @ 2007-08-21 13:05 UTC (permalink / raw)
  To: Florin Iucha
  Cc: linux-usb-devel, Linux Kernel Mailing List, Greg KH, Michal Piotrowski

On Tue, 21 Aug 2007, Florin Iucha wrote:

> I have rebuilt 2.6.23-rc3 with 'CONFIG_USB_EHCI_HCD=m' and 
> 'CONFIG_USB_SUSPEND is not set' and will use it for a while, to see if 
> the keyboard/usb behaves or not.

Thanks. If this doesn't give us any hint, it would be useful if you could 
do git-bisect between rc2 and rc3, I really can't immediately see anything 
in the list of commits that might directly cause the behavior you are 
seeing (most importantly because there were no USB and no HID updates in 
this window).

There are approximately 290 commits, so it shouldn't require more than 9 
reboots plus the time needed to check whether the bug triggers or not.

Thanks,

-- 
Jiri Kosina

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-21 13:05           ` Jiri Kosina
@ 2007-08-21 13:17             ` Florin Iucha
  2007-08-21 13:27               ` Florin Iucha
  0 siblings, 1 reply; 35+ messages in thread
From: Florin Iucha @ 2007-08-21 13:17 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: linux-usb-devel, Linux Kernel Mailing List, Greg KH, Michal Piotrowski

[-- Attachment #1: Type: text/plain, Size: 1290 bytes --]

On Tue, Aug 21, 2007 at 03:05:25PM +0200, Jiri Kosina wrote:
> > I have rebuilt 2.6.23-rc3 with 'CONFIG_USB_EHCI_HCD=m' and 
> > 'CONFIG_USB_SUSPEND is not set' and will use it for a while, to see if 
> > the keyboard/usb behaves or not.
> 
> Thanks. If this doesn't give us any hint, it would be useful if you could 
> do git-bisect between rc2 and rc3, I really can't immediately see anything 
> in the list of commits that might directly cause the behavior you are 
> seeing (most importantly because there were no USB and no HID updates in 
> this window).

The keyboard still locked up.  There is absolutely nothing in the
kernel log.

> There are approximately 290 commits, so it shouldn't require more than 9 
> reboots plus the time needed to check whether the bug triggers or not.

The top commit is not v2.6.23-rc3 but

   commit 28e8351ac22de25034e048c680014ad824323c65
   Merge: 3b993e8... d18c4d6...
   Author: Linus Torvalds <torvalds@woody.linux-foundation.org>
   Date:   Tue Aug 14 10:00:29 2007 -0700

       Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes

I'll try to make time to bisect it...

Thanks,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-21 13:17             ` Florin Iucha
@ 2007-08-21 13:27               ` Florin Iucha
  2007-08-21 13:42                 ` Jiri Kosina
  0 siblings, 1 reply; 35+ messages in thread
From: Florin Iucha @ 2007-08-21 13:27 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: linux-usb-devel, Linux Kernel Mailing List, Michal Piotrowski

[-- Attachment #1: Type: text/plain, Size: 2001 bytes --]

On Tue, Aug 21, 2007 at 08:17:59AM -0500, Florin Iucha wrote:
> On Tue, Aug 21, 2007 at 03:05:25PM +0200, Jiri Kosina wrote:
> > > I have rebuilt 2.6.23-rc3 with 'CONFIG_USB_EHCI_HCD=m' and 
> > > 'CONFIG_USB_SUSPEND is not set' and will use it for a while, to see if 
> > > the keyboard/usb behaves or not.
> > 
> > Thanks. If this doesn't give us any hint, it would be useful if you could 
> > do git-bisect between rc2 and rc3, I really can't immediately see anything 
> > in the list of commits that might directly cause the behavior you are 
> > seeing (most importantly because there were no USB and no HID updates in 
> > this window).
> 
> The keyboard still locked up.  There is absolutely nothing in the
> kernel log.
> 
> > There are approximately 290 commits, so it shouldn't require more than 9 
> > reboots plus the time needed to check whether the bug triggers or not.
> 
> The top commit is not v2.6.23-rc3 but
> 
>    commit 28e8351ac22de25034e048c680014ad824323c65
>    Merge: 3b993e8... d18c4d6...
>    Author: Linus Torvalds <torvalds@woody.linux-foundation.org>
>    Date:   Tue Aug 14 10:00:29 2007 -0700
> 
>        Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes
> 
> I'll try to make time to bisect it...

There is another interesting angle to this: in the past, every time I
had keyboard problems, it used to be caused by the VFS and/or NFS...
after much wrangling, a bunch of bugs were fixed (Hi Trond, Peter,
Alan!).

Now, after the keyboard "locked up", I used the mouse to close the
gnome session, then I logged-in remotely to reboot.  The reboot
process locked up and I need to use the reset button!  The second
time the keyboard "locked up" I listed my processes, and I noticed
that I had a couple of bash processes and a ssh process in "D" state.

Something is fishy again in the VFS ;)

florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-21 13:27               ` Florin Iucha
@ 2007-08-21 13:42                 ` Jiri Kosina
  2007-08-22 13:22                   ` Florin Iucha
  0 siblings, 1 reply; 35+ messages in thread
From: Jiri Kosina @ 2007-08-21 13:42 UTC (permalink / raw)
  To: Florin Iucha
  Cc: linux-usb-devel, Linux Kernel Mailing List, Michal Piotrowski,
	trond.myklebust

On Tue, 21 Aug 2007, Florin Iucha wrote:

> There is another interesting angle to this: in the past, every time I 
> had keyboard problems, it used to be caused by the VFS and/or NFS... 
> after much wrangling, a bunch of bugs were fixed (Hi Trond, Peter, 
> Alan!). Now, after the keyboard "locked up", I used the mouse to close 
> the gnome session, then I logged-in remotely to reboot.  The reboot 
> process locked up and I need to use the reset button!  The second time 
> the keyboard "locked up" I listed my processes, and I noticed that I had 
> a couple of bash processes and a ssh process in "D" state. Something is 
> fishy again in the VFS ;)

Yes, there were some NFS updates in between -rc2 and 
28e8351ac22de25034e048c680014ad824323c65. I'd be now even more curious 
what are you going to find by bisect, please let us know.

I added Trond to CC, full thread to be found at 
http://lkml.org/lkml/2007/8/21/151 for reference.

Florin, it also might be useful to capture the states of stuck processess 
via alt-sysrq-T (or better by echo t > /proc/sysrq-trigger), so that we 
know better where are they stuck.

-- 
Jiri Kosina

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-21 12:04         ` Jiri Kosina
  2007-08-21 12:28           ` Florin Iucha
@ 2007-08-21 14:51           ` Alan Stern
  1 sibling, 0 replies; 35+ messages in thread
From: Alan Stern @ 2007-08-21 14:51 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Florin Iucha, Michal Piotrowski, Greg KH, linux-usb-devel,
	Linux Kernel Mailing List

On Tue, 21 Aug 2007, Jiri Kosina wrote:

> On Tue, 21 Aug 2007, Florin Iucha wrote:
> 
> > I have enabled USB debugging and I see a bunch (=46) of these messages:
> 
> >    [  $timestamp] usb 1-9: usb auto-suspend
> >    [  $timestamp] usb 1-9: usb auto-resume
> >    [  $timestamp] ehci_hcd 0000:00:02.1: GetStatus port 9 status 001005 POWER sig=se0 PE CONNECT
> >    [  $timestamp] usb 1-9: finish resume
> > The messages continued to be logged, even after the keyboard has
> > become unresponsive.
> 
> I guess that this is the card reader being suspended and resumed 
> afterwards. Do you by any chance see any improvement when you

FYI, the card reader suspend/resume problem should be fixed by this 
patch:

	http://marc.info/?l=linux-usb-devel&m=118764229910761&w=2

Alan Stern


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-21 13:42                 ` Jiri Kosina
@ 2007-08-22 13:22                   ` Florin Iucha
  2007-08-23 12:52                     ` NFS woes again Was: " Florin Iucha
  0 siblings, 1 reply; 35+ messages in thread
From: Florin Iucha @ 2007-08-22 13:22 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: linux-usb-devel, Linux Kernel Mailing List, Michal Piotrowski,
	trond.myklebust

[-- Attachment #1: Type: text/plain, Size: 1549 bytes --]

On Tue, Aug 21, 2007 at 03:42:26PM +0200, Jiri Kosina wrote:
> On Tue, 21 Aug 2007, Florin Iucha wrote:
> 
> > There is another interesting angle to this: in the past, every time I 
> > had keyboard problems, it used to be caused by the VFS and/or NFS... 
> > after much wrangling, a bunch of bugs were fixed (Hi Trond, Peter, 
> > Alan!). Now, after the keyboard "locked up", I used the mouse to close 
> > the gnome session, then I logged-in remotely to reboot.  The reboot 
> > process locked up and I need to use the reset button!  The second time 
> > the keyboard "locked up" I listed my processes, and I noticed that I had 
> > a couple of bash processes and a ssh process in "D" state. Something is 
> > fishy again in the VFS ;)
> 
> Yes, there were some NFS updates in between -rc2 and 
> 28e8351ac22de25034e048c680014ad824323c65. I'd be now even more curious 
> what are you going to find by bisect, please let us know.
> 
> I added Trond to CC, full thread to be found at 
> http://lkml.org/lkml/2007/8/21/151 for reference.
> 
> Florin, it also might be useful to capture the states of stuck processess 
> via alt-sysrq-T (or better by echo t > /proc/sysrq-trigger), so that we 
> know better where are they stuck.

This morning it took a bit longer to hang, but it happened.  The
backtraces are at http://iucha.net/2.6.23-rc3/backtraces.gz .

I'll try a bisect session this weekend.

Cheers,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* NFS woes again Was: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-22 13:22                   ` Florin Iucha
@ 2007-08-23 12:52                     ` Florin Iucha
  2007-08-23 17:14                       ` Bret Towe
  0 siblings, 1 reply; 35+ messages in thread
From: Florin Iucha @ 2007-08-23 12:52 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Jiri Kosina, linux-usb-devel, Linux Kernel Mailing List,
	Michal Piotrowski

[-- Attachment #1: Type: text/plain, Size: 3132 bytes --]

Trond,

Fess up... I'm closing in:

   http://iucha.net/2.6.23-rc3/2.6.23-rc-bisect.png

[Dropping Jiri and linux-usb-devel from future postings.  You are
included now just for communicating the conclusion of this thread.]

On Wed, Aug 22, 2007 at 08:22:00AM -0500, Florin Iucha wrote:
> On Tue, Aug 21, 2007 at 03:42:26PM +0200, Jiri Kosina wrote:
> > On Tue, 21 Aug 2007, Florin Iucha wrote:
> > 
> > > There is another interesting angle to this: in the past, every time I 
> > > had keyboard problems, it used to be caused by the VFS and/or NFS... 
> > > after much wrangling, a bunch of bugs were fixed (Hi Trond, Peter, 
> > > Alan!). Now, after the keyboard "locked up", I used the mouse to close 
> > > the gnome session, then I logged-in remotely to reboot.  The reboot 
> > > process locked up and I need to use the reset button!  The second time 
> > > the keyboard "locked up" I listed my processes, and I noticed that I had 
> > > a couple of bash processes and a ssh process in "D" state. Something is 
> > > fishy again in the VFS ;)
> > 
> > Yes, there were some NFS updates in between -rc2 and 
> > 28e8351ac22de25034e048c680014ad824323c65. I'd be now even more curious 
> > what are you going to find by bisect, please let us know.
> > 
> > I added Trond to CC, full thread to be found at 
> > http://lkml.org/lkml/2007/8/21/151 for reference.
> > 
> > Florin, it also might be useful to capture the states of stuck processess 
> > via alt-sysrq-T (or better by echo t > /proc/sysrq-trigger), so that we 
> > know better where are they stuck.
> 
> This morning it took a bit longer to hang, but it happened.  The
> backtraces are at http://iucha.net/2.6.23-rc3/backtraces.gz .
> 
> I'll try a bisect session this weekend.

florin@zeus $ git bisect bad
Bisecting: 5 revisions left to test after this
florin@zeus $ git bisect log
git-bisect start
# good: [d4ac2477fad0f2680e84ec12e387ce67682c5c13] Linux 2.6.23-rc2
git-bisect good d4ac2477fad0f2680e84ec12e387ce67682c5c13
# bad: [28e8351ac22de25034e048c680014ad824323c65] Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes
git-bisect bad 28e8351ac22de25034e048c680014ad824323c65
# bad: [8f2ea1fd3f97ab7a809e939b5b9005a16f862439] [POWERPC] Fix initialization and usage of dma_mask
git-bisect bad 8f2ea1fd3f97ab7a809e939b5b9005a16f862439
# good: [ff95f3df54609d9d4b9572f8a67d09922a645043] sched: remove the 'u64 now' parameter from pick_next_task()
git-bisect good ff95f3df54609d9d4b9572f8a67d09922a645043
# good: [be12014dd7750648fde33e1e45cac24dc9a8be6d] Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched
git-bisect good be12014dd7750648fde33e1e45cac24dc9a8be6d
# good: [6a0ed91e361a93ee1efb4c20c4967024ed2a8dd7] hexdump: use const notation
git-bisect good 6a0ed91e361a93ee1efb4c20c4967024ed2a8dd7
# bad: [6adb31c90c47262c8a25bf5097de9b3426caf3ae] remove dubious legal statment from uio-howto
git-bisect bad 6adb31c90c47262c8a25bf5097de9b3426caf3ae

florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: NFS woes again Was: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-23 12:52                     ` NFS woes again Was: " Florin Iucha
@ 2007-08-23 17:14                       ` Bret Towe
  2007-08-23 17:36                         ` Florin Iucha
  0 siblings, 1 reply; 35+ messages in thread
From: Bret Towe @ 2007-08-23 17:14 UTC (permalink / raw)
  To: Florin Iucha
  Cc: Trond Myklebust, Jiri Kosina, linux-usb-devel,
	Linux Kernel Mailing List, Michal Piotrowski

On 8/23/07, Florin Iucha <florin@iucha.net> wrote:
> Trond,
>
> Fess up... I'm closing in:
>
>    http://iucha.net/2.6.23-rc3/2.6.23-rc-bisect.png
>
> [Dropping Jiri and linux-usb-devel from future postings.  You are
> included now just for communicating the conclusion of this thread.]
>
> On Wed, Aug 22, 2007 at 08:22:00AM -0500, Florin Iucha wrote:
> > On Tue, Aug 21, 2007 at 03:42:26PM +0200, Jiri Kosina wrote:
> > > On Tue, 21 Aug 2007, Florin Iucha wrote:
> > >
> > > > There is another interesting angle to this: in the past, every time I
> > > > had keyboard problems, it used to be caused by the VFS and/or NFS...
> > > > after much wrangling, a bunch of bugs were fixed (Hi Trond, Peter,
> > > > Alan!). Now, after the keyboard "locked up", I used the mouse to close
> > > > the gnome session, then I logged-in remotely to reboot.  The reboot
> > > > process locked up and I need to use the reset button!  The second time
> > > > the keyboard "locked up" I listed my processes, and I noticed that I had
> > > > a couple of bash processes and a ssh process in "D" state. Something is
> > > > fishy again in the VFS ;)
> > >
> > > Yes, there were some NFS updates in between -rc2 and
> > > 28e8351ac22de25034e048c680014ad824323c65. I'd be now even more curious
> > > what are you going to find by bisect, please let us know.
> > >
> > > I added Trond to CC, full thread to be found at
> > > http://lkml.org/lkml/2007/8/21/151 for reference.
> > >
> > > Florin, it also might be useful to capture the states of stuck processess
> > > via alt-sysrq-T (or better by echo t > /proc/sysrq-trigger), so that we
> > > know better where are they stuck.
> >
> > This morning it took a bit longer to hang, but it happened.  The
> > backtraces are at http://iucha.net/2.6.23-rc3/backtraces.gz .
> >
> > I'll try a bisect session this weekend.
>
> florin@zeus $ git bisect bad
> Bisecting: 5 revisions left to test after this
> florin@zeus $ git bisect log
> git-bisect start
> # good: [d4ac2477fad0f2680e84ec12e387ce67682c5c13] Linux 2.6.23-rc2
> git-bisect good d4ac2477fad0f2680e84ec12e387ce67682c5c13
> # bad: [28e8351ac22de25034e048c680014ad824323c65] Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes
> git-bisect bad 28e8351ac22de25034e048c680014ad824323c65
> # bad: [8f2ea1fd3f97ab7a809e939b5b9005a16f862439] [POWERPC] Fix initialization and usage of dma_mask
> git-bisect bad 8f2ea1fd3f97ab7a809e939b5b9005a16f862439
> # good: [ff95f3df54609d9d4b9572f8a67d09922a645043] sched: remove the 'u64 now' parameter from pick_next_task()
> git-bisect good ff95f3df54609d9d4b9572f8a67d09922a645043
> # good: [be12014dd7750648fde33e1e45cac24dc9a8be6d] Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched
> git-bisect good be12014dd7750648fde33e1e45cac24dc9a8be6d
> # good: [6a0ed91e361a93ee1efb4c20c4967024ed2a8dd7] hexdump: use const notation
> git-bisect good 6a0ed91e361a93ee1efb4c20c4967024ed2a8dd7
> # bad: [6adb31c90c47262c8a25bf5097de9b3426caf3ae] remove dubious legal statment from uio-howto
> git-bisect bad 6adb31c90c47262c8a25bf5097de9b3426caf3ae
>
> florin
>
> --
> Bruce Schneier expects the Spanish Inquisition.
>       http://geekz.co.uk/schneierfacts/fact/163
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
>
> iD8DBQFGzYL6ND0rFCN2b1sRAqMlAJ9hvBi5oVBeRYZfNwXDG3EmJNgQ4ACbB4V8
> koRJC/8+P1x600SSS51NvZE=
> =+Adv
> -----END PGP SIGNATURE-----

this sounds alot like the post i did yesterday titled 'nfs4 hang regression'
i tracked it down to commit 3d39c691ff486142dd9aaeac12f553f4476b7a6

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: NFS woes again Was: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-23 17:14                       ` Bret Towe
@ 2007-08-23 17:36                         ` Florin Iucha
  2007-08-27 13:17                           ` Trond Myklebust
  0 siblings, 1 reply; 35+ messages in thread
From: Florin Iucha @ 2007-08-23 17:36 UTC (permalink / raw)
  To: Bret Towe; +Cc: Trond Myklebust, Linux Kernel Mailing List, Michal Piotrowski

[-- Attachment #1: Type: text/plain, Size: 573 bytes --]

On Thu, Aug 23, 2007 at 10:14:38AM -0700, Bret Towe wrote:
> this sounds alot like the post i did yesterday titled 'nfs4 hang regression'
> i tracked it down to commit 3d39c691ff486142dd9aaeac12f553f4476b7a6

Yes, it certainly does -- all the symptoms match!

I'm not [alone in] seeing dead keyboards!

Now, if only somebody could clarify to me the connection between
the bad NFS4 shooting the keyboard but not the mouse, that would
be wonderful.

florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: NFS woes again Was: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-23 17:36                         ` Florin Iucha
@ 2007-08-27 13:17                           ` Trond Myklebust
  2007-08-28  1:19                             ` Bret Towe
  0 siblings, 1 reply; 35+ messages in thread
From: Trond Myklebust @ 2007-08-27 13:17 UTC (permalink / raw)
  To: Florin Iucha; +Cc: Bret Towe, Linux Kernel Mailing List, Michal Piotrowski

[-- Attachment #1: Type: text/plain, Size: 625 bytes --]

On Thu, 2007-08-23 at 12:36 -0500, Florin Iucha wrote:
> On Thu, Aug 23, 2007 at 10:14:38AM -0700, Bret Towe wrote:
> > this sounds alot like the post i did yesterday titled 'nfs4 hang regression'
> > i tracked it down to commit 3d39c691ff486142dd9aaeac12f553f4476b7a6
> 
> Yes, it certainly does -- all the symptoms match!
> 
> I'm not [alone in] seeing dead keyboards!
> 
> Now, if only somebody could clarify to me the connection between
> the bad NFS4 shooting the keyboard but not the mouse, that would
> be wonderful.
> 
> florin

Could you and Bret please check if the attached patch fixes the hang?

Cheers
  Trond



[-- Attachment #2: linux-2.6.23-001-fix_cancel_work_hang.dif --]
[-- Type: message/rfc822, Size: 2327 bytes --]

From: Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: No Subject
Date: Mon, 27 Aug 2007 09:14:56 -0400
Message-ID: <1188220651.6701.33.camel@heimdal.trondhjem.org>

We need to ensure that nobody adds anything to nfs_automount_list while we
are killing off the work queue entry, or else nfs_expire_automounts will
simply rearm it, and we hang.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/namespace.c |   14 +++++++++++++-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
index aea76d0..bcd0777 100644
--- a/fs/nfs/namespace.c
+++ b/fs/nfs/namespace.c
@@ -22,6 +22,11 @@ static void nfs_expire_automounts(struct work_struct *work);
 
 LIST_HEAD(nfs_automount_list);
 static DECLARE_DELAYED_WORK(nfs_automount_task, nfs_expire_automounts);
+/*
+ * The following mutex prevents nfs_follow_mountpoint from adding new
+ * entries to nfs_automount_list
+ */
+static DEFINE_MUTEX(nfs_automount_mutex);
 int nfs_mountpoint_expiry_timeout = 500 * HZ;
 
 static struct vfsmount *nfs_do_submount(const struct vfsmount *mnt_parent,
@@ -128,18 +133,21 @@ static void * nfs_follow_mountpoint(struct dentry *dentry, struct nameidata *nd)
 		goto out_err;
 
 	mntget(mnt);
+	mutex_lock(&nfs_automount_mutex);
 	err = do_add_mount(mnt, nd, nd->mnt->mnt_flags|MNT_SHRINKABLE, &nfs_automount_list);
 	if (err < 0) {
+		mutex_unlock(&nfs_automount_mutex);
 		mntput(mnt);
 		if (err == -EBUSY)
 			goto out_follow;
 		goto out_err;
 	}
+	schedule_delayed_work(&nfs_automount_task, nfs_mountpoint_expiry_timeout);
+	mutex_unlock(&nfs_automount_mutex);
 	mntput(nd->mnt);
 	dput(nd->dentry);
 	nd->mnt = mnt;
 	nd->dentry = dget(mnt->mnt_root);
-	schedule_delayed_work(&nfs_automount_task, nfs_mountpoint_expiry_timeout);
 out:
 	dprintk("%s: done, returned %d\n", __FUNCTION__, err);
 
@@ -175,8 +183,12 @@ static void nfs_expire_automounts(struct work_struct *work)
 
 void nfs_release_automount_timer(void)
 {
+	if (!list_empty(&nfs_automount_list))
+		return;
+	mutex_lock(&nfs_automount_mutex);
 	if (list_empty(&nfs_automount_list))
 		cancel_delayed_work_sync(&nfs_automount_task);
+	mutex_unlock(&nfs_automount_mutex);
 }
 
 /*

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: NFS woes again Was: [linux-usb-devel] USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351
  2007-08-27 13:17                           ` Trond Myklebust
@ 2007-08-28  1:19                             ` Bret Towe
  2007-08-28  1:35                               ` NFS woes again Florin Iucha
  0 siblings, 1 reply; 35+ messages in thread
From: Bret Towe @ 2007-08-28  1:19 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Florin Iucha, Linux Kernel Mailing List, Michal Piotrowski

On 8/27/07, Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> On Thu, 2007-08-23 at 12:36 -0500, Florin Iucha wrote:
> > On Thu, Aug 23, 2007 at 10:14:38AM -0700, Bret Towe wrote:
> > > this sounds alot like the post i did yesterday titled 'nfs4 hang regression'
> > > i tracked it down to commit 3d39c691ff486142dd9aaeac12f553f4476b7a6
> >
> > Yes, it certainly does -- all the symptoms match!
> >
> > I'm not [alone in] seeing dead keyboards!
> >
> > Now, if only somebody could clarify to me the connection between
> > the bad NFS4 shooting the keyboard but not the mouse, that would
> > be wonderful.
> >
> > florin
>
> Could you and Bret please check if the attached patch fixes the hang?

no good for me still hangs after ~30minutes

> Cheers
>   Trond
>
>
>
>
> ---------- Forwarded message ----------
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> To:
> Date: Mon, 27 Aug 2007 09:14:56 -0400
> Subject: No Subject
> We need to ensure that nobody adds anything to nfs_automount_list while we
> are killing off the work queue entry, or else nfs_expire_automounts will
> simply rearm it, and we hang.
>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>
>  fs/nfs/namespace.c |   14 +++++++++++++-
>  1 files changed, 13 insertions(+), 1 deletions(-)
>
> diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
> index aea76d0..bcd0777 100644
> --- a/fs/nfs/namespace.c
> +++ b/fs/nfs/namespace.c
> @@ -22,6 +22,11 @@ static void nfs_expire_automounts(struct work_struct *work);
>
>  LIST_HEAD(nfs_automount_list);
>  static DECLARE_DELAYED_WORK(nfs_automount_task, nfs_expire_automounts);
> +/*
> + * The following mutex prevents nfs_follow_mountpoint from adding new
> + * entries to nfs_automount_list
> + */
> +static DEFINE_MUTEX(nfs_automount_mutex);
>  int nfs_mountpoint_expiry_timeout = 500 * HZ;
>
>  static struct vfsmount *nfs_do_submount(const struct vfsmount *mnt_parent,
> @@ -128,18 +133,21 @@ static void * nfs_follow_mountpoint(struct dentry *dentry, struct nameidata *nd)
>                 goto out_err;
>
>         mntget(mnt);
> +       mutex_lock(&nfs_automount_mutex);
>         err = do_add_mount(mnt, nd, nd->mnt->mnt_flags|MNT_SHRINKABLE, &nfs_automount_list);
>         if (err < 0) {
> +               mutex_unlock(&nfs_automount_mutex);
>                 mntput(mnt);
>                 if (err == -EBUSY)
>                         goto out_follow;
>                 goto out_err;
>         }
> +       schedule_delayed_work(&nfs_automount_task, nfs_mountpoint_expiry_timeout);
> +       mutex_unlock(&nfs_automount_mutex);
>         mntput(nd->mnt);
>         dput(nd->dentry);
>         nd->mnt = mnt;
>         nd->dentry = dget(mnt->mnt_root);
> -       schedule_delayed_work(&nfs_automount_task, nfs_mountpoint_expiry_timeout);
>  out:
>         dprintk("%s: done, returned %d\n", __FUNCTION__, err);
>
> @@ -175,8 +183,12 @@ static void nfs_expire_automounts(struct work_struct *work)
>
>  void nfs_release_automount_timer(void)
>  {
> +       if (!list_empty(&nfs_automount_list))
> +               return;
> +       mutex_lock(&nfs_automount_mutex);
>         if (list_empty(&nfs_automount_list))
>                 cancel_delayed_work_sync(&nfs_automount_task);
> +       mutex_unlock(&nfs_automount_mutex);
>  }
>
>  /*
>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: NFS woes again
  2007-08-28  1:19                             ` Bret Towe
@ 2007-08-28  1:35                               ` Florin Iucha
  2007-08-28 13:28                                 ` Trond Myklebust
  0 siblings, 1 reply; 35+ messages in thread
From: Florin Iucha @ 2007-08-28  1:35 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Bret Towe, Linux Kernel Mailing List, Michal Piotrowski

[-- Attachment #1: Type: text/plain, Size: 820 bytes --]

On Mon, Aug 27, 2007 at 06:19:29PM -0700, Bret Towe wrote:
> On 8/27/07, Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> > > > this sounds alot like the post i did yesterday titled 'nfs4 hang regression'
> > > > i tracked it down to commit 3d39c691ff486142dd9aaeac12f553f4476b7a6
> > >
> > > Yes, it certainly does -- all the symptoms match!
> >
> > Could you and Bret please check if the attached patch fixes the hang?
>
> no good for me still hangs after ~30minutes

I just booted into the new kernel
(3d39c691ff486142dd9aaeac12f553f4476b7a6 + Trond's patch) and it hangs
in 10-15 minutes.

Process traces available at http://iucha.net/nfs/23-rc2-nfs-fix-1/kernel.log.gz

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: NFS woes again
  2007-08-28  1:35                               ` NFS woes again Florin Iucha
@ 2007-08-28 13:28                                 ` Trond Myklebust
  2007-08-29  3:27                                   ` Florin Iucha
  2007-08-29  5:52                                   ` Bret Towe
  0 siblings, 2 replies; 35+ messages in thread
From: Trond Myklebust @ 2007-08-28 13:28 UTC (permalink / raw)
  To: Florin Iucha; +Cc: Bret Towe, Linux Kernel Mailing List, Michal Piotrowski

[-- Attachment #1: Type: text/plain, Size: 967 bytes --]

On Mon, 2007-08-27 at 20:35 -0500, Florin Iucha wrote:
> On Mon, Aug 27, 2007 at 06:19:29PM -0700, Bret Towe wrote:
> > On 8/27/07, Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> > > > > this sounds alot like the post i did yesterday titled 'nfs4 hang regression'
> > > > > i tracked it down to commit 3d39c691ff486142dd9aaeac12f553f4476b7a6
> > > >
> > > > Yes, it certainly does -- all the symptoms match!
> > >
> > > Could you and Bret please check if the attached patch fixes the hang?
> >
> > no good for me still hangs after ~30minutes
> 
> I just booted into the new kernel
> (3d39c691ff486142dd9aaeac12f553f4476b7a6 + Trond's patch) and it hangs
> in 10-15 minutes.
> 
> Process traces available at http://iucha.net/nfs/23-rc2-nfs-fix-1/kernel.log.gz
> 
> Regards,
> florin

Doh! I see the problem: cancel_delayed_work_sync() shouldn't ever be
called recursively.

The following patch should be correct. Please just discard the previous
one...

Trond


[-- Attachment #2: linux-2.6.23-001-fix_cancel_work_hang.dif --]
[-- Type: message/rfc822, Size: 979 bytes --]

From: Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: No Subject
Date: Mon, 27 Aug 2007 09:14:56 -0400
Message-ID: <1188307723.6701.139.camel@heimdal.trondhjem.org>

Doh! We can't use cancel_delayed_work_sync because we may have been called
from an unmount that was being performed by nfs_automount_task.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/namespace.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
index aea76d0..acfc56f 100644
--- a/fs/nfs/namespace.c
+++ b/fs/nfs/namespace.c
@@ -176,7 +176,7 @@ static void nfs_expire_automounts(struct work_struct *work)
 void nfs_release_automount_timer(void)
 {
 	if (list_empty(&nfs_automount_list))
-		cancel_delayed_work_sync(&nfs_automount_task);
+		cancel_delayed_work(&nfs_automount_task);
 }
 
 /*

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: NFS woes again
  2007-08-28 13:28                                 ` Trond Myklebust
@ 2007-08-29  3:27                                   ` Florin Iucha
  2007-08-29  5:52                                   ` Bret Towe
  1 sibling, 0 replies; 35+ messages in thread
From: Florin Iucha @ 2007-08-29  3:27 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Bret Towe, Linux Kernel Mailing List, Michal Piotrowski

[-- Attachment #1: Type: text/plain, Size: 495 bytes --]

On Tue, Aug 28, 2007 at 09:28:43AM -0400, Trond Myklebust wrote:
> Doh! I see the problem: cancel_delayed_work_sync() shouldn't ever be
> called recursively.
> 
> The following patch should be correct. Please just discard the previous
> one...

So far so good.  This patch got one hour uptime...  I'll stay with
this kernel for a few days, to keep an eye on it.

Thanks,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: NFS woes again
  2007-08-28 13:28                                 ` Trond Myklebust
  2007-08-29  3:27                                   ` Florin Iucha
@ 2007-08-29  5:52                                   ` Bret Towe
  2007-08-30 22:18                                     ` Bret Towe
  1 sibling, 1 reply; 35+ messages in thread
From: Bret Towe @ 2007-08-29  5:52 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Florin Iucha, Linux Kernel Mailing List, Michal Piotrowski

On 8/28/07, Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> On Mon, 2007-08-27 at 20:35 -0500, Florin Iucha wrote:
> > On Mon, Aug 27, 2007 at 06:19:29PM -0700, Bret Towe wrote:
> > > On 8/27/07, Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> > > > > > this sounds alot like the post i did yesterday titled 'nfs4 hang regression'
> > > > > > i tracked it down to commit 3d39c691ff486142dd9aaeac12f553f4476b7a6
> > > > >
> > > > > Yes, it certainly does -- all the symptoms match!
> > > >
> > > > Could you and Bret please check if the attached patch fixes the hang?
> > >
> > > no good for me still hangs after ~30minutes
> >
> > I just booted into the new kernel
> > (3d39c691ff486142dd9aaeac12f553f4476b7a6 + Trond's patch) and it hangs
> > in 10-15 minutes.
> >
> > Process traces available at http://iucha.net/nfs/23-rc2-nfs-fix-1/kernel.log.gz
> >
> > Regards,
> > florin
>
> Doh! I see the problem: cancel_delayed_work_sync() shouldn't ever be
> called recursively.
>
> The following patch should be correct. Please just discard the previous
> one...
>
> Trond
>

uptime of 3 hours and keyboard is still working fine
I'll hopefully get to test this on the mini tomorrow for at least 3 hours also

>
> ---------- Forwarded message ----------
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> To:
> Date: Mon, 27 Aug 2007 09:14:56 -0400
> Subject: No Subject
> Doh! We can't use cancel_delayed_work_sync because we may have been called
> from an unmount that was being performed by nfs_automount_task.
>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>
>  fs/nfs/namespace.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
> index aea76d0..acfc56f 100644
> --- a/fs/nfs/namespace.c
> +++ b/fs/nfs/namespace.c
> @@ -176,7 +176,7 @@ static void nfs_expire_automounts(struct work_struct *work)
>  void nfs_release_automount_timer(void)
>  {
>         if (list_empty(&nfs_automount_list))
> -               cancel_delayed_work_sync(&nfs_automount_task);
> +               cancel_delayed_work(&nfs_automount_task);
>  }
>
>  /*
>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: NFS woes again
  2007-08-29  5:52                                   ` Bret Towe
@ 2007-08-30 22:18                                     ` Bret Towe
  2007-08-30 23:14                                       ` Florin Iucha
  0 siblings, 1 reply; 35+ messages in thread
From: Bret Towe @ 2007-08-30 22:18 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Florin Iucha, Linux Kernel Mailing List, Michal Piotrowski

On 8/28/07, Bret Towe <magnade@gmail.com> wrote:
> On 8/28/07, Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> > On Mon, 2007-08-27 at 20:35 -0500, Florin Iucha wrote:
> > > On Mon, Aug 27, 2007 at 06:19:29PM -0700, Bret Towe wrote:
> > > > On 8/27/07, Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> > > > > > > this sounds alot like the post i did yesterday titled 'nfs4 hang regression'
> > > > > > > i tracked it down to commit 3d39c691ff486142dd9aaeac12f553f4476b7a6
> > > > > >
> > > > > > Yes, it certainly does -- all the symptoms match!
> > > > >
> > > > > Could you and Bret please check if the attached patch fixes the hang?
> > > >
> > > > no good for me still hangs after ~30minutes
> > >
> > > I just booted into the new kernel
> > > (3d39c691ff486142dd9aaeac12f553f4476b7a6 + Trond's patch) and it hangs
> > > in 10-15 minutes.
> > >
> > > Process traces available at http://iucha.net/nfs/23-rc2-nfs-fix-1/kernel.log.gz
> > >
> > > Regards,
> > > florin
> >
> > Doh! I see the problem: cancel_delayed_work_sync() shouldn't ever be
> > called recursively.
> >
> > The following patch should be correct. Please just discard the previous
> > one...
> >
> > Trond
> >
>
> uptime of 3 hours and keyboard is still working fine
> I'll hopefully get to test this on the mini tomorrow for at least 3 hours also

got 45min on mini before I had to go elsewhere
the amd64 shutdown fine and has been up for more than 3 hours
I'd say the patch does it

> >
> > ---------- Forwarded message ----------
> > From: Trond Myklebust <Trond.Myklebust@netapp.com>
> > To:
> > Date: Mon, 27 Aug 2007 09:14:56 -0400
> > Subject: No Subject
> > Doh! We can't use cancel_delayed_work_sync because we may have been called
> > from an unmount that was being performed by nfs_automount_task.
> >
> > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> > ---
> >
> >  fs/nfs/namespace.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
> > index aea76d0..acfc56f 100644
> > --- a/fs/nfs/namespace.c
> > +++ b/fs/nfs/namespace.c
> > @@ -176,7 +176,7 @@ static void nfs_expire_automounts(struct work_struct *work)
> >  void nfs_release_automount_timer(void)
> >  {
> >         if (list_empty(&nfs_automount_list))
> > -               cancel_delayed_work_sync(&nfs_automount_task);
> > +               cancel_delayed_work(&nfs_automount_task);
> >  }
> >
> >  /*
> >
> >
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: NFS woes again
  2007-08-30 22:18                                     ` Bret Towe
@ 2007-08-30 23:14                                       ` Florin Iucha
  0 siblings, 0 replies; 35+ messages in thread
From: Florin Iucha @ 2007-08-30 23:14 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Bret Towe, Linux Kernel Mailing List, Michal Piotrowski

[-- Attachment #1: Type: text/plain, Size: 581 bytes --]

On Thu, Aug 30, 2007 at 03:18:37PM -0700, Bret Towe wrote:
> > uptime of 3 hours and keyboard is still working fine
> > I'll hopefully get to test this on the mini tomorrow for at least 3 hours also
> 
> got 45min on mini before I had to go elsewhere
> the amd64 shutdown fine and has been up for more than 3 hours
> I'd say the patch does it

Yup.  Same here.  Many startups, shuthdowns and minutes of uptime,
with no observations.  Check it in!

Thanks,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2007-08-30 23:15 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-15 13:31 USB-related oops in sysfs with linux v2.6.23-rc3-50-g28e8351 Florin Iucha
2007-08-15 14:38 ` [linux-usb-devel] " Alan Stern
2007-08-15 14:50   ` Florin Iucha
2007-08-15 15:24     ` Alan Stern
2007-08-15 14:54   ` Tejun Heo
2007-08-15 15:21     ` Cornelia Huck
2007-08-15 15:30       ` Tejun Heo
2007-08-15 15:33     ` Alan Stern
2007-08-15 14:49 ` Jiri Kosina
2007-08-15 14:53   ` Florin Iucha
2007-08-15 14:58     ` Jiri Kosina
2007-08-21 11:51       ` Florin Iucha
2007-08-21 12:04         ` Jiri Kosina
2007-08-21 12:28           ` Florin Iucha
2007-08-21 14:51           ` Alan Stern
2007-08-21 12:06         ` Oliver Neukum
2007-08-21 12:09           ` Jiri Kosina
2007-08-21 12:19           ` Oliver Neukum
2007-08-21 12:57         ` Florin Iucha
2007-08-21 13:05           ` Jiri Kosina
2007-08-21 13:17             ` Florin Iucha
2007-08-21 13:27               ` Florin Iucha
2007-08-21 13:42                 ` Jiri Kosina
2007-08-22 13:22                   ` Florin Iucha
2007-08-23 12:52                     ` NFS woes again Was: " Florin Iucha
2007-08-23 17:14                       ` Bret Towe
2007-08-23 17:36                         ` Florin Iucha
2007-08-27 13:17                           ` Trond Myklebust
2007-08-28  1:19                             ` Bret Towe
2007-08-28  1:35                               ` NFS woes again Florin Iucha
2007-08-28 13:28                                 ` Trond Myklebust
2007-08-29  3:27                                   ` Florin Iucha
2007-08-29  5:52                                   ` Bret Towe
2007-08-30 22:18                                     ` Bret Towe
2007-08-30 23:14                                       ` Florin Iucha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).