LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
@ 2008-10-17 12:32 Max Kellermann
2008-10-17 14:33 ` Glauber Costa
2008-10-20 6:27 ` Ian Campbell
0 siblings, 2 replies; 50+ messages in thread
From: Max Kellermann @ 2008-10-17 12:32 UTC (permalink / raw)
To: linux-kernel, gcosta, ijc
Hi,
Ian: this is a follow-up to your post "NFS regression? Odd delays and
lockups accessing an NFS export" a few weeks ago
(http://lkml.org/lkml/2008/9/27/42).
I am able to trigger this bug within a few minutes on a customer's
machine (large web hoster, a *lot* of NFS traffic).
Symptom: with 2.6.26 (2.6.27.1, too), load goes to 100+, dmesg says
"INFO: task migration/2:9 blocked for more than 120 seconds." with
varying task names. Except for the high load average, the machine
seems to work.
With git bisect, I was finally able to identify the guilty commit,
it's not "Ensure we zap only the access and acl caches when setting
new acls" like you guessed, Ian. According to my bisect,
6becedbb06072c5741d4057b9facecb4b3143711 is the origin of the problem.
e481fcf8563d300e7f8875cae5fdc41941d29de0 (its parent) works well.
Glauber: that is your patch "x86: minor adjustments for do_boot_cpu"
(http://lkml.org/lkml/2008/3/19/143). I don't understand this patch
well, and I fail to see a connection with the symptom, but maybe
somebody else does...
See patch below (applies to 2.6.27.1). So far, it looks like the
problem is solved on the server, no visible side effects.
Max
Revert "x86: minor adjustments for do_boot_cpu"
According to a bisect, Glauber Costa's patch induced high load and
"task ... blocked for more than 120 seconds" messages in dmesg. This
patch reverts 6becedbb06072c5741d4057b9facecb4b3143711.
Signed-off-by: Max Kellermann <mk@cm4all.com>
---
arch/x86/kernel/smpboot.c | 21 ++++++++-------------
1 files changed, 8 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 7985c5b..789cf84 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -808,7 +808,7 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu)
* Returns zero if CPU booted OK, else error code from wakeup_secondary_cpu.
*/
{
- unsigned long boot_error = 0;
+ unsigned long boot_error;
int timeout;
unsigned long start_ip;
unsigned short nmi_high = 0, nmi_low = 0;
@@ -828,7 +828,11 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu)
}
#endif
- alternatives_smp_switch(1);
+ /*
+ * Save current MTRR state in case it was changed since early boot
+ * (e.g. by the ACPI SMI) to initialize new CPUs with MTRRs in sync:
+ */
+ mtrr_save_state();
c_idle.idle = get_idle_for_cpu(cpu);
@@ -873,6 +877,8 @@ do_rest:
/* start_ip had better be page-aligned! */
start_ip = setup_trampoline();
+ alternatives_smp_switch(1);
+
/* So we see what's up */
printk(KERN_INFO "Booting processor %d/%d ip %lx\n",
cpu, apicid, start_ip);
@@ -891,11 +897,6 @@ do_rest:
store_NMI_vector(&nmi_high, &nmi_low);
smpboot_setup_warm_reset_vector(start_ip);
- /*
- * Be paranoid about clearing APIC errors.
- */
- apic_write(APIC_ESR, 0);
- apic_read(APIC_ESR);
}
/*
@@ -986,12 +987,6 @@ int __cpuinit native_cpu_up(unsigned int cpu)
return -ENOSYS;
}
- /*
- * Save current MTRR state in case it was changed since early boot
- * (e.g. by the ACPI SMI) to initialize new CPUs with MTRRs in sync:
- */
- mtrr_save_state();
-
per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
#ifdef CONFIG_X86_32
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-10-17 12:32 [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds" Max Kellermann
@ 2008-10-17 14:33 ` Glauber Costa
2008-10-20 6:51 ` Max Kellermann
2008-10-20 6:27 ` Ian Campbell
1 sibling, 1 reply; 50+ messages in thread
From: Glauber Costa @ 2008-10-17 14:33 UTC (permalink / raw)
To: Max Kellermann; +Cc: linux-kernel, gcosta, ijc
On Fri, Oct 17, 2008 at 02:32:07PM +0200, Max Kellermann wrote:
> Hi,
>
> Ian: this is a follow-up to your post "NFS regression? Odd delays and
> lockups accessing an NFS export" a few weeks ago
> (http://lkml.org/lkml/2008/9/27/42).
>
> I am able to trigger this bug within a few minutes on a customer's
> machine (large web hoster, a *lot* of NFS traffic).
>
> Symptom: with 2.6.26 (2.6.27.1, too), load goes to 100+, dmesg says
> "INFO: task migration/2:9 blocked for more than 120 seconds." with
> varying task names. Except for the high load average, the machine
> seems to work.
>
> With git bisect, I was finally able to identify the guilty commit,
> it's not "Ensure we zap only the access and acl caches when setting
> new acls" like you guessed, Ian. According to my bisect,
> 6becedbb06072c5741d4057b9facecb4b3143711 is the origin of the problem.
> e481fcf8563d300e7f8875cae5fdc41941d29de0 (its parent) works well.
>
> Glauber: that is your patch "x86: minor adjustments for do_boot_cpu"
> (http://lkml.org/lkml/2008/3/19/143). I don't understand this patch
> well, and I fail to see a connection with the symptom, but maybe
> somebody else does...
>
> See patch below (applies to 2.6.27.1). So far, it looks like the
> problem is solved on the server, no visible side effects.
>
> Max
That's probably something related to apic congestion.
Does the problem go away if the only thing you change is this:
> @@ -891,11 +897,6 @@ do_rest:
> store_NMI_vector(&nmi_high, &nmi_low);
>
> smpboot_setup_warm_reset_vector(start_ip);
> - /*
> - * Be paranoid about clearing APIC errors.
> - */
> - apic_write(APIC_ESR, 0);
> - apic_read(APIC_ESR);
> }
Please let me know.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-10-17 12:32 [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds" Max Kellermann
2008-10-17 14:33 ` Glauber Costa
@ 2008-10-20 6:27 ` Ian Campbell
2008-11-01 11:45 ` Ian Campbell
1 sibling, 1 reply; 50+ messages in thread
From: Ian Campbell @ 2008-10-20 6:27 UTC (permalink / raw)
To: Max Kellermann
Cc: linux-kernel, gcosta, Grant Coady, Trond Myklebust,
J. Bruce Fields, Tom Tucker
[-- Attachment #1: Type: text/plain, Size: 3490 bytes --]
(adding back some CC's, please don't drop people)
On Fri, 2008-10-17 at 14:32 +0200, Max Kellermann wrote:
> Ian: this is a follow-up to your post "NFS regression? Odd delays and
> lockups accessing an NFS export" a few weeks ago
> (http://lkml.org/lkml/2008/9/27/42).
>
> I am able to trigger this bug within a few minutes on a customer's
> machine (large web hoster, a *lot* of NFS traffic).
>
> Symptom: with 2.6.26 (2.6.27.1, too), load goes to 100+, dmesg says
> "INFO: task migration/2:9 blocked for more than 120 seconds." with
> varying task names. Except for the high load average, the machine
> seems to work.
>
> With git bisect, I was finally able to identify the guilty commit,
> it's not "Ensure we zap only the access and acl caches when setting
> new acls" like you guessed, Ian. According to my bisect,
> 6becedbb06072c5741d4057b9facecb4b3143711 is the origin of the problem.
> e481fcf8563d300e7f8875cae5fdc41941d29de0 (its parent) works well.
The issue I see still occurs well before those changesets. I have seen
it with v2.6.25 but v2.6.24 survived for 7 days without issue (my
threshold for a good kernel is 7 days, hence bisecting is a bit
slow...).
So far I have bisected down to this range and am currently testing
acee478 which has been up for >4days.
$ git bisect visualize --pretty=oneline
bdc7f021f3a1fade77adf3c2d7f65690566fddfe NFS: Clean up the (commit|read|write)_setup() callback routines
3ff7576ddac06c3d07089e241b40826d24bbf1ac SUNRPC: Clean up the initialisation of priority queue scheduling info.
c970aa85e71bd581726c42df843f6f129db275ac SUNRPC: Clean up rpc_run_task
84115e1cd4a3614c4e566d4cce31381dce3dbef9 SUNRPC: Cleanup of rpc_task initialisation
ef818a28fac9bd214e676986d8301db0582b92a9 NFS: Stop sillyname renames and unmounts from racing
2f74c0a05612b9c2014b5b67833dba9b9f523948 NFSv4: Clean up the OPEN/CLOSE serialisation code
acee478afc6ff7e1b8852d9a4dca1ff36021414d NFS: Clean up the write request locking.
8b1f9ee56e21e505a3d5d3e33f823006d1abdbaf NFS: Optimise nfs_vm_page_mkwrite()
77f111929d024165e736e919187cff017279bebe NFS: Ensure that we eject stale inodes as soon as possible
d45b9d8baf41acb177abbbe6746b1dea094b8a28 NFS: Handle -ENOENT errors in unlink()/rmdir()/rename()
609005c319bc6062b95ed82e132884ed7e22cdb9 NFS: Sillyrename: in the case of a race, check aliases are really positive
fccca7fc6aab4e6b519e2d606ef34632e4f50e33 NFS: Fix a sillyrename race...
note that this bisect is over fs/nfs only so it's possible the I might
drop off the beginning and have to bisect the 3878 commits between
v2.6.24 and fccca7f. I hope not! acee478 looks good so far.
$ git bisect log
# bad: [4b119e21d0c66c22e8ca03df05d9de623d0eb50f] Linux 2.6.25
# good: [49914084e797530d9baaf51df9eda77babc98fa8] Linux 2.6.24
git-bisect start 'v2.6.25' 'v2.6.24' '--' 'fs/nfs'
# bad: [4c5680177012a2b5c0f3fdf58f4375dd84a1da67] NFS: Support non-IPv4 addresses in nfs_parsed_mount_data
git-bisect bad 4c5680177012a2b5c0f3fdf58f4375dd84a1da67
# bad: [d45273ed6f4613e81701c3e896d9db200c288fff] NFS: Clean up address comparison in __nfs_find_client()
git-bisect bad d45273ed6f4613e81701c3e896d9db200c288fff
# bad: [bdc7f021f3a1fade77adf3c2d7f65690566fddfe] NFS: Clean up the (commit|read|write)_setup() callback routines
git-bisect bad bdc7f021f3a1fade77adf3c2d7f65690566fddfe
Ian.
--
Ian Campbell
"It is easier to fight for principles than to live up to them."
-- Alfred Adler
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-10-17 14:33 ` Glauber Costa
@ 2008-10-20 6:51 ` Max Kellermann
2008-10-20 7:43 ` Ian Campbell
` (2 more replies)
0 siblings, 3 replies; 50+ messages in thread
From: Max Kellermann @ 2008-10-20 6:51 UTC (permalink / raw)
To: Glauber Costa
Cc: linux-kernel, ijc, Grant Coady, Trond Myklebust, J. Bruce Fields,
Tom Tucker
On 2008/10/17 16:33, Glauber Costa <glommer@redhat.com> wrote:
> That's probably something related to apic congestion.
> Does the problem go away if the only thing you change is this:
>
>
> > @@ -891,11 +897,6 @@ do_rest:
> > store_NMI_vector(&nmi_high, &nmi_low);
> >
> > smpboot_setup_warm_reset_vector(start_ip);
> > - /*
> > - * Be paranoid about clearing APIC errors.
> > - */
> > - apic_write(APIC_ESR, 0);
> > - apic_read(APIC_ESR);
> > }
>
>
> Please let me know.
Hello Glauber,
I have rebooted the server with 2.6.27.1 + this patchlet an hour ago.
No problems since.
Hardware: Compaq P4 Xeon server, Broadcom CMIC-WS / CIOB-X2 board.
Tell me if you need more detailed information.
On 2008/10/20 08:27, Ian Campbell <ijc@hellion.org.uk> wrote:
> The issue I see still occurs well before those changesets. I have
> seen it with v2.6.25 but v2.6.24 survived for 7 days without issue
> (my threshold for a good kernel is 7 days, hence bisecting is a bit
> slow...).
Hello Ian,
it seems we're hunting down different bugs after all. Too bad, I
hoped I could have solved your problem, too. Our machine has been
running well over the weekend with the patch I posted; with faulty
kernels, the problem would occur after a few minutes.
Max
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-10-20 6:51 ` Max Kellermann
@ 2008-10-20 7:43 ` Ian Campbell
2008-10-20 13:15 ` Glauber Costa
2009-05-22 20:59 ` H. Peter Anvin
2 siblings, 0 replies; 50+ messages in thread
From: Ian Campbell @ 2008-10-20 7:43 UTC (permalink / raw)
To: Max Kellermann
Cc: Glauber Costa, linux-kernel, Grant Coady, Trond Myklebust,
J. Bruce Fields, Tom Tucker
[-- Attachment #1: Type: text/plain, Size: 843 bytes --]
On Mon, 2008-10-20 at 08:51 +0200, Max Kellermann wrote:
>
> On 2008/10/20 08:27, Ian Campbell <ijc@hellion.org.uk> wrote:
> > The issue I see still occurs well before those changesets. I have
> > seen it with v2.6.25 but v2.6.24 survived for 7 days without issue
> > (my threshold for a good kernel is 7 days, hence bisecting is a bit
> > slow...).
>
> Hello Ian,
>
> it seems we're hunting down different bugs after all. Too bad, I
> hoped I could have solved your problem, too.
Thanks anyway, I'll just keep on bisecting ;-)
> Our machine has been
> running well over the weekend with the patch I posted; with faulty
> kernels, the problem would occur after a few minutes.
>
--
Ian Campbell
BOFH excuse #400:
We are Microsoft. What you are experiencing is not a problem; it is an undocumented feature.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-10-20 6:51 ` Max Kellermann
2008-10-20 7:43 ` Ian Campbell
@ 2008-10-20 13:15 ` Glauber Costa
2008-10-20 14:12 ` Max Kellermann
2008-10-20 14:21 ` Cyrill Gorcunov
2009-05-22 20:59 ` H. Peter Anvin
2 siblings, 2 replies; 50+ messages in thread
From: Glauber Costa @ 2008-10-20 13:15 UTC (permalink / raw)
To: Max Kellermann
Cc: Glauber Costa, linux-kernel, ijc, Grant Coady, Trond Myklebust,
J. Bruce Fields, Tom Tucker, gorcunov
On Mon, Oct 20, 2008 at 4:51 AM, Max Kellermann <mk@cm4all.com> wrote:
> On 2008/10/17 16:33, Glauber Costa <glommer@redhat.com> wrote:
>> That's probably something related to apic congestion.
>> Does the problem go away if the only thing you change is this:
>>
>>
>> > @@ -891,11 +897,6 @@ do_rest:
>> > store_NMI_vector(&nmi_high, &nmi_low);
>> >
>> > smpboot_setup_warm_reset_vector(start_ip);
>> > - /*
>> > - * Be paranoid about clearing APIC errors.
>> > - */
>> > - apic_write(APIC_ESR, 0);
>> > - apic_read(APIC_ESR);
>> > }
>>
>>
>> Please let me know.
>
> Hello Glauber,
>
> I have rebooted the server with 2.6.27.1 + this patchlet an hour ago.
> No problems since.
>
> Hardware: Compaq P4 Xeon server, Broadcom CMIC-WS / CIOB-X2 board.
> Tell me if you need more detailed information.
>
There's a patch in flight from cyrill that probably fixes your problem:
http://lkml.org/lkml/2008/9/15/93
The checks are obviously there for a reason, and we can't just wipe
them out unconditionally ;-) So can you check please that you are also
covered by the case provided?
> On 2008/10/20 08:27, Ian Campbell <ijc@hellion.org.uk> wrote:
>> The issue I see still occurs well before those changesets. I have
>> seen it with v2.6.25 but v2.6.24 survived for 7 days without issue
>> (my threshold for a good kernel is 7 days, hence bisecting is a bit
>> slow...).
>
> Hello Ian,
>
> it seems we're hunting down different bugs after all. Too bad, I
> hoped I could have solved your problem, too. Our machine has been
> running well over the weekend with the patch I posted; with faulty
> kernels, the problem would occur after a few minutes.
>
> Max
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
Glauber Costa.
"Free as in Freedom"
http://glommer.net
"The less confident you are, the more serious you have to act."
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-10-20 13:15 ` Glauber Costa
@ 2008-10-20 14:12 ` Max Kellermann
2008-10-20 14:34 ` Cyrill Gorcunov
2008-10-20 14:21 ` Cyrill Gorcunov
1 sibling, 1 reply; 50+ messages in thread
From: Max Kellermann @ 2008-10-20 14:12 UTC (permalink / raw)
To: Glauber Costa; +Cc: linux-kernel
On 2008/10/20 15:15, Glauber Costa <glommer@gmail.com> wrote:
> There's a patch in flight from cyrill that probably fixes your
> problem: http://lkml.org/lkml/2008/9/15/93
>
> The checks are obviously there for a reason, and we can't just wipe
> them out unconditionally ;-) So can you check please that you are
> also covered by the case provided?
Looks good: booted the machine 30 minutes ago, no problems so far.
Max
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-10-20 13:15 ` Glauber Costa
2008-10-20 14:12 ` Max Kellermann
@ 2008-10-20 14:21 ` Cyrill Gorcunov
1 sibling, 0 replies; 50+ messages in thread
From: Cyrill Gorcunov @ 2008-10-20 14:21 UTC (permalink / raw)
To: Glauber Costa
Cc: Max Kellermann, Glauber Costa, linux-kernel, ijc, Grant Coady,
Trond Myklebust, J. Bruce Fields, Tom Tucker
[Glauber Costa - Mon, Oct 20, 2008 at 11:15:56AM -0200]
| On Mon, Oct 20, 2008 at 4:51 AM, Max Kellermann <mk@cm4all.com> wrote:
| > On 2008/10/17 16:33, Glauber Costa <glommer@redhat.com> wrote:
| >> That's probably something related to apic congestion.
| >> Does the problem go away if the only thing you change is this:
| >>
| >>
| >> > @@ -891,11 +897,6 @@ do_rest:
| >> > store_NMI_vector(&nmi_high, &nmi_low);
| >> >
| >> > smpboot_setup_warm_reset_vector(start_ip);
| >> > - /*
| >> > - * Be paranoid about clearing APIC errors.
| >> > - */
| >> > - apic_write(APIC_ESR, 0);
| >> > - apic_read(APIC_ESR);
| >> > }
| >>
| >>
| >> Please let me know.
| >
| > Hello Glauber,
| >
| > I have rebooted the server with 2.6.27.1 + this patchlet an hour ago.
| > No problems since.
| >
| > Hardware: Compaq P4 Xeon server, Broadcom CMIC-WS / CIOB-X2 board.
| > Tell me if you need more detailed information.
| >
|
| There's a patch in flight from cyrill that probably fixes your problem:
| http://lkml.org/lkml/2008/9/15/93
|
| The checks are obviously there for a reason, and we can't just wipe
| them out unconditionally ;-) So can you check please that you are also
| covered by the case provided?
Actually I'll wonder if it help. Do Xeon processors really not
have ESR register and not integrated?
...
- Cyrill -
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-10-20 14:12 ` Max Kellermann
@ 2008-10-20 14:34 ` Cyrill Gorcunov
0 siblings, 0 replies; 50+ messages in thread
From: Cyrill Gorcunov @ 2008-10-20 14:34 UTC (permalink / raw)
To: Max Kellermann; +Cc: Glauber Costa, linux-kernel
[Max Kellermann - Mon, Oct 20, 2008 at 04:12:58PM +0200]
| On 2008/10/20 15:15, Glauber Costa <glommer@gmail.com> wrote:
| > There's a patch in flight from cyrill that probably fixes your
| > problem: http://lkml.org/lkml/2008/9/15/93
| >
| > The checks are obviously there for a reason, and we can't just wipe
| > them out unconditionally ;-) So can you check please that you are
| > also covered by the case provided?
|
| Looks good: booted the machine 30 minutes ago, no problems so far.
|
| Max
|
Thanks Max for testing! (me -- wonders since the patch helped
for now :-)
- Cyrill -
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-10-20 6:27 ` Ian Campbell
@ 2008-11-01 11:45 ` Ian Campbell
2008-11-01 13:41 ` Trond Myklebust
0 siblings, 1 reply; 50+ messages in thread
From: Ian Campbell @ 2008-11-01 11:45 UTC (permalink / raw)
To: Max Kellermann
Cc: linux-kernel, gcosta, Grant Coady, Trond Myklebust,
J. Bruce Fields, Tom Tucker
[-- Attachment #1: Type: text/plain, Size: 1211 bytes --]
On Mon, 2008-10-20 at 07:27 +0100, Ian Campbell wrote:
> So far I have bisected down to this range and am currently testing
> acee478 which has been up for >4days.
Another update. It has now bisected down to a small range
7272dcd31d56580dee7693c21e369fd167e137fe SUNRPC: xprt_autoclose() should not call xprt_disconnect()
e06799f958bf7f9f8fae15f0c6f519953fb0257c SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket
ef80367071dce7d2533e79ae8f3c84ec42708dc8 SUNRPC: TCP clear XPRT_CLOSE_WAIT when the socket is closed for writes
3b948ae5be5e22532584113e2e02029519bbad8f SUNRPC: Allow the client to detect if the TCP connection is closed
67a391d72ca7efb387c30ec761a487e50a3ff085 SUNRPC: Fix TCP rebinding logic
66af1e558538137080615e7ad6d1f2f80862de01 SUNRPC: Fix a race in xs_tcp_state_change()
I'm currently testing 3b948ae5be5e22532584113e2e02029519bbad8f.
7272dcd31d56580dee7693c21e369fd167e137fe repro'd in half a day while
ef818a28fac9bd214e676986d8301db0582b92a9 (parent of
66af1e558538137080615e7ad6d1f2f80862de01) survived for 7 days.
Ian.
--
Ian Campbell
There is no delight the equal of dread. As long as it is somebody
else's.
-- Clive Barker
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-11-01 11:45 ` Ian Campbell
@ 2008-11-01 13:41 ` Trond Myklebust
2008-11-02 14:40 ` Ian Campbell
` (2 more replies)
0 siblings, 3 replies; 50+ messages in thread
From: Trond Myklebust @ 2008-11-01 13:41 UTC (permalink / raw)
To: Ian Campbell
Cc: Max Kellermann, linux-kernel, gcosta, Grant Coady,
J. Bruce Fields, Tom Tucker
[-- Attachment #1: Type: text/plain, Size: 1369 bytes --]
On Sat, 2008-11-01 at 11:45 +0000, Ian Campbell wrote:
> On Mon, 2008-10-20 at 07:27 +0100, Ian Campbell wrote:
> > So far I have bisected down to this range and am currently testing
> > acee478 which has been up for >4days.
>
> Another update. It has now bisected down to a small range
>
> 7272dcd31d56580dee7693c21e369fd167e137fe SUNRPC: xprt_autoclose() should not call xprt_disconnect()
> e06799f958bf7f9f8fae15f0c6f519953fb0257c SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket
> ef80367071dce7d2533e79ae8f3c84ec42708dc8 SUNRPC: TCP clear XPRT_CLOSE_WAIT when the socket is closed for writes
> 3b948ae5be5e22532584113e2e02029519bbad8f SUNRPC: Allow the client to detect if the TCP connection is closed
> 67a391d72ca7efb387c30ec761a487e50a3ff085 SUNRPC: Fix TCP rebinding logic
> 66af1e558538137080615e7ad6d1f2f80862de01 SUNRPC: Fix a race in xs_tcp_state_change()
>
> I'm currently testing 3b948ae5be5e22532584113e2e02029519bbad8f.
>
> 7272dcd31d56580dee7693c21e369fd167e137fe repro'd in half a day while
> ef818a28fac9bd214e676986d8301db0582b92a9 (parent of
> 66af1e558538137080615e7ad6d1f2f80862de01) survived for 7 days.
>
> Ian.
Have you tested with the TCP RST fix yet? It has been merged into
mainline, so it should be in the latest 2.6.28-git, but I've attached it
so you can apply it to your test kernel...
Cheers
Trond
[-- Attachment #2: linux-2.6.27-001-respond_promptly_to_socket_errors.dif --]
[-- Type: application/x-dif, Size: 4418 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-11-01 13:41 ` Trond Myklebust
@ 2008-11-02 14:40 ` Ian Campbell
2008-11-07 2:12 ` kenneth johansson
2008-11-04 19:10 ` Ian Campbell
2008-11-25 7:09 ` Ian Campbell
2 siblings, 1 reply; 50+ messages in thread
From: Ian Campbell @ 2008-11-02 14:40 UTC (permalink / raw)
To: Trond Myklebust
Cc: Max Kellermann, linux-kernel, gcosta, Grant Coady,
J. Bruce Fields, Tom Tucker
[-- Attachment #1: Type: text/plain, Size: 494 bytes --]
On Sat, 2008-11-01 at 09:41 -0400, Trond Myklebust wrote:
>
>
> Have you tested with the TCP RST fix yet? It has been merged into
> mainline, so it should be in the latest 2.6.28-git, but I've attached
> it so you can apply it to your test kernel...
I wasn't aware of it. I'll give it a go.
Thanks,
Ian.
>
--
Ian Campbell
His designs were strictly honourable, as the phrase is: that is, to rob
a lady of her fortune by way of marriage.
-- Henry Fielding, "Tom Jones"
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-11-01 13:41 ` Trond Myklebust
2008-11-02 14:40 ` Ian Campbell
@ 2008-11-04 19:10 ` Ian Campbell
2008-11-25 7:09 ` Ian Campbell
2 siblings, 0 replies; 50+ messages in thread
From: Ian Campbell @ 2008-11-04 19:10 UTC (permalink / raw)
To: Trond Myklebust
Cc: Max Kellermann, linux-kernel, gcosta, Grant Coady,
J. Bruce Fields, Tom Tucker
[-- Attachment #1: Type: text/plain, Size: 1827 bytes --]
On Sat, 2008-11-01 at 09:41 -0400, Trond Myklebust wrote:
> On Sat, 2008-11-01 at 11:45 +0000, Ian Campbell wrote:
> > On Mon, 2008-10-20 at 07:27 +0100, Ian Campbell wrote:
> > > So far I have bisected down to this range and am currently testing
> > > acee478 which has been up for >4days.
> >
> > Another update. It has now bisected down to a small range
> >
> > 7272dcd31d56580dee7693c21e369fd167e137fe SUNRPC: xprt_autoclose() should not call xprt_disconnect()
> > e06799f958bf7f9f8fae15f0c6f519953fb0257c SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket
> > ef80367071dce7d2533e79ae8f3c84ec42708dc8 SUNRPC: TCP clear XPRT_CLOSE_WAIT when the socket is closed for writes
> > 3b948ae5be5e22532584113e2e02029519bbad8f SUNRPC: Allow the client to detect if the TCP connection is closed
> > 67a391d72ca7efb387c30ec761a487e50a3ff085 SUNRPC: Fix TCP rebinding logic
> > 66af1e558538137080615e7ad6d1f2f80862de01 SUNRPC: Fix a race in xs_tcp_state_change()
> >
> > I'm currently testing 3b948ae5be5e22532584113e2e02029519bbad8f.
> >
> > 7272dcd31d56580dee7693c21e369fd167e137fe repro'd in half a day while
> > ef818a28fac9bd214e676986d8301db0582b92a9 (parent of
> > 66af1e558538137080615e7ad6d1f2f80862de01) survived for 7 days.
> >
> > Ian.
>
> Have you tested with the TCP RST fix yet? It has been merged into
> mainline, so it should be in the latest 2.6.28-git, but I've attached it
> so you can apply it to your test kernel...
I cherry picked 2a9e1cfa23fb62da37739af81127dab5af095d99 onto v2.6.25
and unfortunately it has not fixed the issue. I'll go back to bisecting
with 3b948ae5be5e22532584113e2e02029519bbad8f.
Ian.
>
> Cheers
> Trond
>
--
Ian Campbell
Superior ability breeds superior ambition.
-- Spock, "Space Seed", stardate 3141.9
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-11-02 14:40 ` Ian Campbell
@ 2008-11-07 2:12 ` kenneth johansson
0 siblings, 0 replies; 50+ messages in thread
From: kenneth johansson @ 2008-11-07 2:12 UTC (permalink / raw)
To: linux-kernel
On Sun, 02 Nov 2008 14:40:47 +0000, Ian Campbell wrote:
> On Sat, 2008-11-01 at 09:41 -0400, Trond Myklebust wrote:
>>
>>
>> Have you tested with the TCP RST fix yet? It has been merged into
>> mainline, so it should be in the latest 2.6.28-git, but I've attached
>> it so you can apply it to your test kernel...
>
> I wasn't aware of it. I'll give it a go.
>
> Thanks,
> Ian.
>>
I think I having the same problem as you. At least I have a gut feeling it's nfs related.
What good and bad versions do you have so far in your bisecting. ??
I see the problem several times a day so it should be possible
to at least try one or two versions per day.
this is on a 2.6.27.2 client to a 2.6.26.3 server.
--------
sudo grep blocked /var/log/syslog.0
Nov 5 02:06:27 duo kernel: [ 5080.947067] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 02:08:40 duo kernel: [ 5214.091071] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 02:10:49 duo kernel: [ 5342.940064] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 02:33:15 duo kernel: [ 6688.338072] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 02:35:44 duo kernel: [ 6837.588072] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 02:38:12 duo kernel: [ 6985.765070] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 02:40:37 duo kernel: [ 7130.720067] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 06:56:00 duo kernel: [22454.090070] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 07:51:38 duo kernel: [25791.279105] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 09:39:33 duo kernel: [32267.016068] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 09:41:55 duo kernel: [32408.750061] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 09:44:17 duo kernel: [32550.484061] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 09:46:37 duo kernel: [32691.144064] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 16:26:25 duo kernel: [56678.536068] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 16:28:50 duo kernel: [56823.492067] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 16:31:17 duo kernel: [56970.594061] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 16:33:44 duo kernel: [57117.697062] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 16:51:04 duo kernel: [58158.153065] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 18:00:39 duo kernel: [62256.625050] INFO: task hald-addon-stor:7110 blocked for more than 120 seconds.
Nov 5 18:15:16 duo kernel: [63210.108080] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 18:24:08 duo kernel: [63741.610074] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 19:47:09 duo kernel: [68722.698056] INFO: task hald-addon-stor:7102 blocked for more than 120 seconds.
Nov 5 19:47:53 duo kernel: [68722.698307] INFO: task hald-addon-stor:7105 blocked for more than 120 seconds.
Nov 5 19:47:53 duo kernel: [68722.698513] INFO: task hald-addon-stor:7110 blocked for more than 120 seconds.
Nov 5 19:47:53 duo kernel: [68755.984030] INFO: task scsi_eh_12:2687 blocked for more than 120 seconds.
Nov 5 22:20:11 duo kernel: [77904.265068] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 22:23:10 duo kernel: [78083.580065] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 22:46:09 duo kernel: [79462.264081] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 23:22:05 duo kernel: [81599.010033] INFO: task scsi_eh_10:2675 blocked for more than 120 seconds.
Nov 5 23:22:05 duo kernel: [81599.010253] INFO: task hald-addon-stor:7097 blocked for more than 120 seconds.
Nov 5 23:22:05 duo kernel: [81599.010468] INFO: task hald-addon-stor:7102 blocked for more than 120 seconds.
Nov 5 23:22:05 duo kernel: [81599.010674] INFO: task hald-addon-stor:7110 blocked for more than 120 seconds.
Nov 6 01:46:31 duo kernel: [90209.346061] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 6 01:46:31 duo kernel: [90242.632048] INFO: task hald-addon-stor:7102 blocked for more than 120 seconds.
Nov 6 01:46:31 duo kernel: [90242.632230] INFO: task hald-addon-stor:7105 blocked for more than 120 seconds.
Nov 6 01:46:31 duo kernel: [90242.632368] INFO: task hald-addon-stor:7110 blocked for more than 120 seconds.
Nov 6 01:46:31 duo kernel: [90275.918024] INFO: task scsi_eh_12:2687 blocked for more than 120 seconds.
Nov 6 02:11:59 duo kernel: [91812.443070] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 6 02:14:12 duo kernel: [91945.587069] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 6 02:16:50 duo kernel: [92103.427069] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 6 02:28:26 duo kernel: [92759.483053] INFO: task hald-addon-usb-:7009 blocked for more than 120 seconds.
Nov 6 02:28:26 duo kernel: [92759.483233] INFO: task hald-addon-stor:7105 blocked for more than 120 seconds.
Nov 6 02:28:26 duo kernel: [92759.483447] INFO: task hald-addon-stor:7110 blocked for more than 120 seconds.
Nov 6 02:28:26 duo kernel: [92792.769034] INFO: task scsi_eh_12:2687 blocked for more than 120 seconds.
Nov 6 02:58:49 duo kernel: [94622.425059] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-11-01 13:41 ` Trond Myklebust
2008-11-02 14:40 ` Ian Campbell
2008-11-04 19:10 ` Ian Campbell
@ 2008-11-25 7:09 ` Ian Campbell
2008-11-25 13:28 ` Trond Myklebust
2 siblings, 1 reply; 50+ messages in thread
From: Ian Campbell @ 2008-11-25 7:09 UTC (permalink / raw)
To: Trond Myklebust, linux-nfs
Cc: Max Kellermann, linux-kernel, gcosta, Grant Coady,
J. Bruce Fields, Tom Tucker
[-- Attachment #1: Type: text/plain, Size: 2233 bytes --]
On Sat, 2008-11-01 at 09:41 -0400, Trond Myklebust wrote:
> On Sat, 2008-11-01 at 11:45 +0000, Ian Campbell wrote:
> > On Mon, 2008-10-20 at 07:27 +0100, Ian Campbell wrote:
> > > So far I have bisected down to this range and am currently testing
> > > acee478 which has been up for >4days.
> >
> > Another update. It has now bisected down to a small range
> >
> > 7272dcd31d56580dee7693c21e369fd167e137fe SUNRPC: xprt_autoclose() should not call xprt_disconnect()
> > e06799f958bf7f9f8fae15f0c6f519953fb0257c SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket
> > ef80367071dce7d2533e79ae8f3c84ec42708dc8 SUNRPC: TCP clear XPRT_CLOSE_WAIT when the socket is closed for writes
> > 3b948ae5be5e22532584113e2e02029519bbad8f SUNRPC: Allow the client to detect if the TCP connection is closed
> > 67a391d72ca7efb387c30ec761a487e50a3ff085 SUNRPC: Fix TCP rebinding logic
> > 66af1e558538137080615e7ad6d1f2f80862de01 SUNRPC: Fix a race in xs_tcp_state_change()
> >
> > I'm currently testing 3b948ae5be5e22532584113e2e02029519bbad8f.
> >
> > 7272dcd31d56580dee7693c21e369fd167e137fe repro'd in half a day while
> > ef818a28fac9bd214e676986d8301db0582b92a9 (parent of
> > 66af1e558538137080615e7ad6d1f2f80862de01) survived for 7 days.
According to bisect:
e06799f958bf7f9f8fae15f0c6f519953fb0257c is first bad commit
commit e06799f958bf7f9f8fae15f0c6f519953fb0257c
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Mon Nov 5 15:44:12 2007 -0500
SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket
By using shutdown() rather than close() we allow the RPC client to wait
for the TCP close handshake to complete before we start trying to reconnect
using the same port.
We use shutdown(SHUT_WR) only instead of shutting down both directions,
however we wait until the server has closed the connection on its side.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I've started testing 2.6.26 + revert. It's been a long while since I
started this process so I'll also have a go at an up to date version.
Cheers,
Ian.
--
Ian Campbell
By failing to prepare, you are preparing to fail.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-11-25 7:09 ` Ian Campbell
@ 2008-11-25 13:28 ` Trond Myklebust
2008-11-25 13:38 ` Ian Campbell
0 siblings, 1 reply; 50+ messages in thread
From: Trond Myklebust @ 2008-11-25 13:28 UTC (permalink / raw)
To: Ian Campbell
Cc: linux-nfs, Max Kellermann, linux-kernel, gcosta, Grant Coady,
J. Bruce Fields, Tom Tucker
On Tue, 2008-11-25 at 07:09 +0000, Ian Campbell wrote:
> On Sat, 2008-11-01 at 09:41 -0400, Trond Myklebust wrote:
> > On Sat, 2008-11-01 at 11:45 +0000, Ian Campbell wrote:
> > > On Mon, 2008-10-20 at 07:27 +0100, Ian Campbell wrote:
> > > > So far I have bisected down to this range and am currently testing
> > > > acee478 which has been up for >4days.
> > >
> > > Another update. It has now bisected down to a small range
> > >
> > > 7272dcd31d56580dee7693c21e369fd167e137fe SUNRPC: xprt_autoclose() should not call xprt_disconnect()
> > > e06799f958bf7f9f8fae15f0c6f519953fb0257c SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket
> > > ef80367071dce7d2533e79ae8f3c84ec42708dc8 SUNRPC: TCP clear XPRT_CLOSE_WAIT when the socket is closed for writes
> > > 3b948ae5be5e22532584113e2e02029519bbad8f SUNRPC: Allow the client to detect if the TCP connection is closed
> > > 67a391d72ca7efb387c30ec761a487e50a3ff085 SUNRPC: Fix TCP rebinding logic
> > > 66af1e558538137080615e7ad6d1f2f80862de01 SUNRPC: Fix a race in xs_tcp_state_change()
> > >
> > > I'm currently testing 3b948ae5be5e22532584113e2e02029519bbad8f.
> > >
> > > 7272dcd31d56580dee7693c21e369fd167e137fe repro'd in half a day while
> > > ef818a28fac9bd214e676986d8301db0582b92a9 (parent of
> > > 66af1e558538137080615e7ad6d1f2f80862de01) survived for 7 days.
>
> According to bisect:
>
> e06799f958bf7f9f8fae15f0c6f519953fb0257c is first bad commit
> commit e06799f958bf7f9f8fae15f0c6f519953fb0257c
> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date: Mon Nov 5 15:44:12 2007 -0500
>
> SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket
>
> By using shutdown() rather than close() we allow the RPC client to wait
> for the TCP close handshake to complete before we start trying to reconnect
> using the same port.
> We use shutdown(SHUT_WR) only instead of shutting down both directions,
> however we wait until the server has closed the connection on its side.
>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
>
> I've started testing 2.6.26 + revert. It's been a long while since I
> started this process so I'll also have a go at an up to date version.
>
> Cheers,
That would indicate that the server is failing to close the TCP
connection when the client closes on its end.
Could you remind me what server you are using? Also, does 'netstat -t'
show connections that are stuck in the CLOSE_WAIT state when you see the
hang?
Trond
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-11-25 13:28 ` Trond Myklebust
@ 2008-11-25 13:38 ` Ian Campbell
2008-11-25 13:57 ` Trond Myklebust
0 siblings, 1 reply; 50+ messages in thread
From: Ian Campbell @ 2008-11-25 13:38 UTC (permalink / raw)
To: Trond Myklebust
Cc: linux-nfs, Max Kellermann, linux-kernel, gcosta, Grant Coady,
J. Bruce Fields, Tom Tucker
On Tue, 2008-11-25 at 08:28 -0500, Trond Myklebust wrote:
> On Tue, 2008-11-25 at 07:09 +0000, Ian Campbell wrote:
> > On Sat, 2008-11-01 at 09:41 -0400, Trond Myklebust wrote:
> > > On Sat, 2008-11-01 at 11:45 +0000, Ian Campbell wrote:
> > > > On Mon, 2008-10-20 at 07:27 +0100, Ian Campbell wrote:
> > > > > So far I have bisected down to this range and am currently testing
> > > > > acee478 which has been up for >4days.
> > > >
> > > > Another update. It has now bisected down to a small range
> > > >
> > > > 7272dcd31d56580dee7693c21e369fd167e137fe SUNRPC: xprt_autoclose() should not call xprt_disconnect()
> > > > e06799f958bf7f9f8fae15f0c6f519953fb0257c SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket
> > > > ef80367071dce7d2533e79ae8f3c84ec42708dc8 SUNRPC: TCP clear XPRT_CLOSE_WAIT when the socket is closed for writes
> > > > 3b948ae5be5e22532584113e2e02029519bbad8f SUNRPC: Allow the client to detect if the TCP connection is closed
> > > > 67a391d72ca7efb387c30ec761a487e50a3ff085 SUNRPC: Fix TCP rebinding logic
> > > > 66af1e558538137080615e7ad6d1f2f80862de01 SUNRPC: Fix a race in xs_tcp_state_change()
> > > >
> > > > I'm currently testing 3b948ae5be5e22532584113e2e02029519bbad8f.
> > > >
> > > > 7272dcd31d56580dee7693c21e369fd167e137fe repro'd in half a day while
> > > > ef818a28fac9bd214e676986d8301db0582b92a9 (parent of
> > > > 66af1e558538137080615e7ad6d1f2f80862de01) survived for 7 days.
> >
> > According to bisect:
> >
> > e06799f958bf7f9f8fae15f0c6f519953fb0257c is first bad commit
> > commit e06799f958bf7f9f8fae15f0c6f519953fb0257c
> > Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> > Date: Mon Nov 5 15:44:12 2007 -0500
> >
> > SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket
> >
> > By using shutdown() rather than close() we allow the RPC client to wait
> > for the TCP close handshake to complete before we start trying to reconnect
> > using the same port.
> > We use shutdown(SHUT_WR) only instead of shutting down both directions,
> > however we wait until the server has closed the connection on its side.
> >
> > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> >
> > I've started testing 2.6.26 + revert. It's been a long while since I
> > started this process so I'll also have a go at an up to date version.
> >
> > Cheers,
>
> That would indicate that the server is failing to close the TCP
> connection when the client closes on its end.
>
> Could you remind me what server you are using?
2.6.25-2-486 which is a Debian package from backports.org, changelog
indicates that it contains 2.6.25.7.
> Also, does 'netstat -t'
> show connections that are stuck in the CLOSE_WAIT state when you see the
> hang?
I'd have to wait for it to reproduce again to be 100% sure but according
to http://lkml.indiana.edu/hypermail/linux/kernel/0808.3/0120.html
I was seeing connections in FIN_WAIT2 but not CLOSE_WAIT.
Ian.
--
Ian Campbell
Current Noise: Diamond Head - It's Electric
"The only real way to look younger is not to be born so soon."
-- Charles Schulz, "Things I've Had to Learn Over and
Over and Over"
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-11-25 13:38 ` Ian Campbell
@ 2008-11-25 13:57 ` Trond Myklebust
2008-11-25 14:04 ` Ian Campbell
0 siblings, 1 reply; 50+ messages in thread
From: Trond Myklebust @ 2008-11-25 13:57 UTC (permalink / raw)
To: Ian Campbell
Cc: linux-nfs, Max Kellermann, linux-kernel, gcosta, Grant Coady,
J. Bruce Fields, Tom Tucker
On Tue, 2008-11-25 at 13:38 +0000, Ian Campbell wrote:
> > That would indicate that the server is failing to close the TCP
> > connection when the client closes on its end.
> >
> > Could you remind me what server you are using?
>
> 2.6.25-2-486 which is a Debian package from backports.org, changelog
> indicates that it contains 2.6.25.7.
Hmm... It should normally close sockets when the state changes. There
might be a race, though...
> > Also, does 'netstat -t'
> > show connections that are stuck in the CLOSE_WAIT state when you see the
> > hang?
>
> I'd have to wait for it to reproduce again to be 100% sure but according
> to http://lkml.indiana.edu/hypermail/linux/kernel/0808.3/0120.html
> I was seeing connections in FIN_WAIT2 but not CLOSE_WAIT.
That would be on the client side. I'm talking about the server.
Cheers
Trond
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-11-25 13:57 ` Trond Myklebust
@ 2008-11-25 14:04 ` Ian Campbell
2008-11-26 22:12 ` Ian Campbell
0 siblings, 1 reply; 50+ messages in thread
From: Ian Campbell @ 2008-11-25 14:04 UTC (permalink / raw)
To: Trond Myklebust
Cc: linux-nfs, Max Kellermann, linux-kernel, gcosta, Grant Coady,
J. Bruce Fields, Tom Tucker
On Tue, 2008-11-25 at 08:57 -0500, Trond Myklebust wrote:
> On Tue, 2008-11-25 at 13:38 +0000, Ian Campbell wrote:
> > > That would indicate that the server is failing to close the TCP
> > > connection when the client closes on its end.
> > >
> > > Could you remind me what server you are using?
> >
> > 2.6.25-2-486 which is a Debian package from backports.org, changelog
> > indicates that it contains 2.6.25.7.
>
> Hmm... It should normally close sockets when the state changes. There
> might be a race, though...
>
> > > Also, does 'netstat -t'
> > > show connections that are stuck in the CLOSE_WAIT state when you see the
> > > hang?
> >
> > I'd have to wait for it to reproduce again to be 100% sure but according
> > to http://lkml.indiana.edu/hypermail/linux/kernel/0808.3/0120.html
> > I was seeing connections in FIN_WAIT2 but not CLOSE_WAIT.
>
> That would be on the client side. I'm talking about the server.
Ah, OK. I'll abort my current test of 2.6.26+revert and wait for a repro
so I can netstat the server, give me a couple of days...
Ian.
--
Ian Campbell
It is more rational to sacrifice one life than six.
-- Spock, "The Galileo Seven", stardate 2822.3
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-11-25 14:04 ` Ian Campbell
@ 2008-11-26 22:12 ` Ian Campbell
2008-12-01 0:17 ` [PATCH 0/3] " Trond Myklebust
0 siblings, 1 reply; 50+ messages in thread
From: Ian Campbell @ 2008-11-26 22:12 UTC (permalink / raw)
To: Trond Myklebust
Cc: linux-nfs, Max Kellermann, linux-kernel, gcosta, Grant Coady,
J. Bruce Fields, Tom Tucker
[-- Attachment #1: Type: text/plain, Size: 5264 bytes --]
On Tue, 2008-11-25 at 14:04 +0000, Ian Campbell wrote:
> On Tue, 2008-11-25 at 08:57 -0500, Trond Myklebust wrote:
> > On Tue, 2008-11-25 at 13:38 +0000, Ian Campbell wrote:
> > > > That would indicate that the server is failing to close the TCP
> > > > connection when the client closes on its end.
> > > >
> > > > Could you remind me what server you are using?
> > >
> > > 2.6.25-2-486 which is a Debian package from backports.org, changelog
> > > indicates that it contains 2.6.25.7.
> >
> > Hmm... It should normally close sockets when the state changes. There
> > might be a race, though...
> >
> > > > Also, does 'netstat -t'
> > > > show connections that are stuck in the CLOSE_WAIT state when you see the
> > > > hang?
> > >
> > > I'd have to wait for it to reproduce again to be 100% sure but according
> > > to http://lkml.indiana.edu/hypermail/linux/kernel/0808.3/0120.html
> > > I was seeing connections in FIN_WAIT2 but not CLOSE_WAIT.
> >
> > That would be on the client side. I'm talking about the server.
>
> Ah, OK. I'll abort my current test of 2.6.26+revert and wait for a repro
> so I can netstat the server, give me a couple of days...
So on the server I see the following. 192.168.1.4 is the problematic
client and 192.168.1.6 is the server.
Maybe not interesting but 192.168.1.5 also uses NFS for my $HOME and
runs 2.6.26 with no lockups.
# netstat -t -n
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 1 0 192.168.1.6:2049 192.168.1.4:723 CLOSE_WAIT
tcp 1 0 192.168.1.6:2049 192.168.1.4:920 CLOSE_WAIT
tcp 1 0 192.168.1.6:2049 192.168.1.4:890 CLOSE_WAIT
tcp 1 0 192.168.1.6:2049 192.168.1.4:698 CLOSE_WAIT
tcp 1 0 192.168.1.6:2049 192.168.1.4:705 CLOSE_WAIT
tcp 1 0 192.168.1.6:2049 192.168.1.4:943 CLOSE_WAIT
tcp 1 0 192.168.1.6:2049 192.168.1.4:915 CLOSE_WAIT
tcp 0 0 192.168.1.6:2049 192.168.1.5:783 ESTABLISHED
tcp 1 0 192.168.1.6:2049 192.168.1.4:998 CLOSE_WAIT
tcp 1 0 192.168.1.6:2049 192.168.1.4:758 CLOSE_WAIT
tcp 1 0 192.168.1.6:2049 192.168.1.4:955 CLOSE_WAIT
tcp 1 0 192.168.1.6:2049 192.168.1.4:845 CLOSE_WAIT
tcp 1 0 192.168.1.6:2049 192.168.1.4:827 CLOSE_WAIT
tcp 0 0 192.168.1.6:58464 128.31.0.36:80 ESTABLISHED
tcp 1 0 192.168.1.6:2049 192.168.1.4:754 CLOSE_WAIT
tcp 1 0 192.168.1.6:2049 192.168.1.4:837 CLOSE_WAIT
tcp 1 0 192.168.1.6:2049 192.168.1.4:918 CLOSE_WAIT
tcp 1 0 192.168.1.6:2049 192.168.1.4:865 CLOSE_WAIT
tcp 0 0 192.168.1.6:48343 192.168.1.5:832 ESTABLISHED
tcp 1 0 192.168.1.6:2049 192.168.1.4:840 CLOSE_WAIT
tcp 1 0 192.168.1.6:2049 192.168.1.4:883 CLOSE_WAIT
tcp 1 0 192.168.1.6:2049 192.168.1.4:785 CLOSE_WAIT
tcp 1 0 192.168.1.6:2049 192.168.1.4:720 CLOSE_WAIT
tcp6 0 0 ::ffff:192.168.1.6:22 ::ffff:192.168.1.:38206 ESTABLISHED
tcp6 0 0 ::ffff:192.168.1.6:143 ::ffff:192.168.1.:41308 ESTABLISHED
tcp6 0 0 ::ffff:192.168.1.6:143 ::ffff:192.168.1.:55784 ESTABLISHED
tcp6 0 0 ::ffff:192.168.1.6:22 ::ffff:192.168.1.:39046 ESTABLISHED
and on the client
# netstat -t -n
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 192.168.1.4:943 192.168.1.6:2049 FIN_WAIT2
tcp 0 0 192.168.1.4:33959 192.168.1.4:6543 ESTABLISHED
tcp 0 0 192.168.1.4:6543 192.168.1.4:54157 ESTABLISHED
tcp 0 0 127.0.0.1:13666 127.0.0.1:33364 ESTABLISHED
tcp 0 0 192.168.1.4:22 192.168.1.5:54696 ESTABLISHED
tcp 0 0 192.168.1.4:22 192.168.1.5:47599 ESTABLISHED
tcp 0 0 192.168.1.4:54156 192.168.1.4:6543 ESTABLISHED
tcp 0 0 192.168.1.4:6543 192.168.1.4:33957 ESTABLISHED
tcp 0 0 192.168.1.4:33957 192.168.1.4:6543 ESTABLISHED
tcp 0 0 192.168.1.4:54157 192.168.1.4:6543 ESTABLISHED
tcp 0 0 192.168.1.4:6543 192.168.1.4:54156 ESTABLISHED
tcp 0 0 192.168.1.4:6543 192.168.1.4:33959 ESTABLISHED
tcp 0 0 127.0.0.1:47756 127.0.0.1:6545 ESTABLISHED
tcp 0 0 127.0.0.1:33364 127.0.0.1:13666 ESTABLISHED
tcp 0 0 127.0.0.1:6545 127.0.0.1:47756 ESTABLISHED
>
> Ian.
--
Ian Campbell
Just once, I wish we would encounter an alien menace that wasn't
immune to bullets.
-- The Brigadier, "Dr. Who"
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-11-26 22:12 ` Ian Campbell
@ 2008-12-01 0:17 ` Trond Myklebust
2008-12-01 0:18 ` [PATCH 1/3] SUNRPC: Ensure the server closes sockets in a timely fashion Trond Myklebust
` (4 more replies)
0 siblings, 5 replies; 50+ messages in thread
From: Trond Myklebust @ 2008-12-01 0:17 UTC (permalink / raw)
To: Ian Campbell
Cc: linux-nfs, Max Kellermann, linux-kernel, gcosta, Grant Coady,
J. Bruce Fields, Tom Tucker
Can you see if the following 3 patches help? They're against 2.6.28-rc6,
but afaics the problems are pretty much the same on 2.6.26.
Cheers
Trond
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/3] SUNRPC: Ensure the server closes sockets in a timely fashion
2008-12-01 0:17 ` [PATCH 0/3] " Trond Myklebust
@ 2008-12-01 0:18 ` Trond Myklebust
2008-12-17 15:27 ` Tom Tucker
2008-12-01 0:19 ` [PATCH 2/3] SUNRPC: We only need to call svc_delete_xprt() once Trond Myklebust
` (3 subsequent siblings)
4 siblings, 1 reply; 50+ messages in thread
From: Trond Myklebust @ 2008-12-01 0:18 UTC (permalink / raw)
To: Ian Campbell
Cc: linux-nfs, Max Kellermann, linux-kernel, gcosta, Grant Coady,
J. Bruce Fields, Tom Tucker
We want to ensure that connected sockets close down the connection when we
set XPT_CLOSE, so that we don't keep it hanging while cleaning up all the
stuff that is keeping a reference to the socket.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
net/sunrpc/svcsock.c | 21 ++++++++++++++++++++-
1 files changed, 20 insertions(+), 1 deletions(-)
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 95293f5..a1b048d 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -59,6 +59,7 @@ static void svc_udp_data_ready(struct sock *, int);
static int svc_udp_recvfrom(struct svc_rqst *);
static int svc_udp_sendto(struct svc_rqst *);
static void svc_sock_detach(struct svc_xprt *);
+static void svc_tcp_sock_detach(struct svc_xprt *);
static void svc_sock_free(struct svc_xprt *);
static struct svc_xprt *svc_create_socket(struct svc_serv *, int,
@@ -1017,7 +1018,7 @@ static struct svc_xprt_ops svc_tcp_ops = {
.xpo_recvfrom = svc_tcp_recvfrom,
.xpo_sendto = svc_tcp_sendto,
.xpo_release_rqst = svc_release_skb,
- .xpo_detach = svc_sock_detach,
+ .xpo_detach = svc_tcp_sock_detach,
.xpo_free = svc_sock_free,
.xpo_prep_reply_hdr = svc_tcp_prep_reply_hdr,
.xpo_has_wspace = svc_tcp_has_wspace,
@@ -1282,6 +1283,24 @@ static void svc_sock_detach(struct svc_xprt *xprt)
sk->sk_state_change = svsk->sk_ostate;
sk->sk_data_ready = svsk->sk_odata;
sk->sk_write_space = svsk->sk_owspace;
+
+ if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
+ wake_up_interruptible(sk->sk_sleep);
+}
+
+/*
+ * Disconnect the socket, and reset the callbacks
+ */
+static void svc_tcp_sock_detach(struct svc_xprt *xprt)
+{
+ struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
+
+ dprintk("svc: svc_tcp_sock_detach(%p)\n", svsk);
+
+ svc_sock_detach(xprt);
+
+ if (!test_bit(XPT_LISTENER, &xprt->xpt_flags))
+ kernel_sock_shutdown(svsk->sk_sock, SHUT_RDWR);
}
/*
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 2/3] SUNRPC: We only need to call svc_delete_xprt() once...
2008-12-01 0:17 ` [PATCH 0/3] " Trond Myklebust
2008-12-01 0:18 ` [PATCH 1/3] SUNRPC: Ensure the server closes sockets in a timely fashion Trond Myklebust
@ 2008-12-01 0:19 ` Trond Myklebust
2008-12-01 0:20 ` [PATCH 3/3] SUNRPC: svc_xprt_enqueue should not refuse to enqueue 'XPT_DEAD' transports Trond Myklebust
` (2 subsequent siblings)
4 siblings, 0 replies; 50+ messages in thread
From: Trond Myklebust @ 2008-12-01 0:19 UTC (permalink / raw)
To: Ian Campbell
Cc: linux-nfs, Max Kellermann, linux-kernel, gcosta, Grant Coady,
J. Bruce Fields, Tom Tucker
Use XPT_DEAD to ensure that we only call xpo_detach & friends once.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
net/sunrpc/svc_xprt.c | 17 +++++++++++------
1 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index bf5b5cd..a417064 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -838,6 +838,10 @@ void svc_delete_xprt(struct svc_xprt *xprt)
{
struct svc_serv *serv = xprt->xpt_server;
+ /* Only do this once */
+ if (test_and_set_bit(XPT_DEAD, &xprt->xpt_flags) != 0)
+ return;
+
dprintk("svc: svc_delete_xprt(%p)\n", xprt);
xprt->xpt_ops->xpo_detach(xprt);
@@ -851,13 +855,14 @@ void svc_delete_xprt(struct svc_xprt *xprt)
* while still attached to a queue, the queue itself
* is about to be destroyed (in svc_destroy).
*/
- if (!test_and_set_bit(XPT_DEAD, &xprt->xpt_flags)) {
- BUG_ON(atomic_read(&xprt->xpt_ref.refcount) < 2);
- if (test_bit(XPT_TEMP, &xprt->xpt_flags))
- serv->sv_tmpcnt--;
- svc_xprt_put(xprt);
- }
+ if (test_bit(XPT_TEMP, &xprt->xpt_flags))
+ serv->sv_tmpcnt--;
spin_unlock_bh(&serv->sv_lock);
+
+ /* FIXME: Is this really needed here? */
+ BUG_ON(atomic_read(&xprt->xpt_ref.refcount) < 2);
+
+ svc_xprt_put(xprt);
}
void svc_close_xprt(struct svc_xprt *xprt)
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 3/3] SUNRPC: svc_xprt_enqueue should not refuse to enqueue 'XPT_DEAD' transports
2008-12-01 0:17 ` [PATCH 0/3] " Trond Myklebust
2008-12-01 0:18 ` [PATCH 1/3] SUNRPC: Ensure the server closes sockets in a timely fashion Trond Myklebust
2008-12-01 0:19 ` [PATCH 2/3] SUNRPC: We only need to call svc_delete_xprt() once Trond Myklebust
@ 2008-12-01 0:20 ` Trond Myklebust
2008-12-17 15:35 ` Tom Tucker
2008-12-01 0:29 ` [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds" Trond Myklebust
2008-12-01 22:09 ` Ian Campbell
4 siblings, 1 reply; 50+ messages in thread
From: Trond Myklebust @ 2008-12-01 0:20 UTC (permalink / raw)
To: Ian Campbell
Cc: linux-nfs, Max Kellermann, linux-kernel, gcosta, Grant Coady,
J. Bruce Fields, Tom Tucker
Aside from being racy (there is nothing preventing someone setting XPT_DEAD
after the test in svc_xprt_enqueue, and before XPT_BUSY is set), it is
wrong to assume that transports which have called svc_delete_xprt() might
not need to be re-enqueued.
See the list of deferred requests, which is currently never going to
be cleared if the revisit call happens after svc_delete_xprt(). In this
case, the deferred request will currently keep a reference to the transport
forever.
The fix should be to allow dead transports to be enqueued in order to clear
the deferred requests, then change the order of processing in svc_recv() so
that we pick up deferred requests before we do the XPT_CLOSE processing.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
net/sunrpc/svc_xprt.c | 124 +++++++++++++++++++++++++++----------------------
1 files changed, 69 insertions(+), 55 deletions(-)
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index a417064..b54cf84 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -297,10 +297,15 @@ void svc_xprt_enqueue(struct svc_xprt *xprt)
struct svc_serv *serv = xprt->xpt_server;
struct svc_pool *pool;
struct svc_rqst *rqstp;
+ unsigned long flags;
int cpu;
- if (!(xprt->xpt_flags &
- ((1<<XPT_CONN)|(1<<XPT_DATA)|(1<<XPT_CLOSE)|(1<<XPT_DEFERRED))))
+ flags = xprt->xpt_flags &
+ (1UL<<XPT_CONN | 1UL<<XPT_DATA | 1UL<<XPT_CLOSE |
+ 1UL<<XPT_DEAD | 1UL<<XPT_DEFERRED);
+ if (flags == 0)
+ return;
+ if ((flags & 1UL<<XPT_DEAD) != 0 && (flags & 1UL<<XPT_DEFERRED) == 0)
return;
cpu = get_cpu();
@@ -315,12 +320,6 @@ void svc_xprt_enqueue(struct svc_xprt *xprt)
"svc_xprt_enqueue: "
"threads and transports both waiting??\n");
- if (test_bit(XPT_DEAD, &xprt->xpt_flags)) {
- /* Don't enqueue dead transports */
- dprintk("svc: transport %p is dead, not enqueued\n", xprt);
- goto out_unlock;
- }
-
/* Mark transport as busy. It will remain in this state until
* the provider calls svc_xprt_received. We update XPT_BUSY
* atomically because it also guards against trying to enqueue
@@ -566,6 +565,7 @@ static void svc_check_conn_limits(struct svc_serv *serv)
int svc_recv(struct svc_rqst *rqstp, long timeout)
{
struct svc_xprt *xprt = NULL;
+ struct svc_xprt *newxpt;
struct svc_serv *serv = rqstp->rq_server;
struct svc_pool *pool = rqstp->rq_pool;
int len, i;
@@ -673,62 +673,76 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
spin_unlock_bh(&pool->sp_lock);
len = 0;
+
+ /*
+ * Deal with deferred requests first, since they need to be
+ * dequeued and dropped if the transport has been closed.
+ */
+ rqstp->rq_deferred = svc_deferred_dequeue(xprt);
+ if (rqstp->rq_deferred) {
+ svc_xprt_received(xprt);
+ len = svc_deferred_recv(rqstp);
+ }
+
if (test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
dprintk("svc_recv: found XPT_CLOSE\n");
svc_delete_xprt(xprt);
- } else if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) {
- struct svc_xprt *newxpt;
- newxpt = xprt->xpt_ops->xpo_accept(xprt);
- if (newxpt) {
- /*
- * We know this module_get will succeed because the
- * listener holds a reference too
- */
- __module_get(newxpt->xpt_class->xcl_owner);
- svc_check_conn_limits(xprt->xpt_server);
- spin_lock_bh(&serv->sv_lock);
- set_bit(XPT_TEMP, &newxpt->xpt_flags);
- list_add(&newxpt->xpt_list, &serv->sv_tempsocks);
- serv->sv_tmpcnt++;
- if (serv->sv_temptimer.function == NULL) {
- /* setup timer to age temp transports */
- setup_timer(&serv->sv_temptimer,
- svc_age_temp_xprts,
- (unsigned long)serv);
- mod_timer(&serv->sv_temptimer,
- jiffies + svc_conn_age_period * HZ);
- }
- spin_unlock_bh(&serv->sv_lock);
- svc_xprt_received(newxpt);
- }
- svc_xprt_received(xprt);
- } else {
- dprintk("svc: server %p, pool %u, transport %p, inuse=%d\n",
- rqstp, pool->sp_id, xprt,
- atomic_read(&xprt->xpt_ref.refcount));
- rqstp->rq_deferred = svc_deferred_dequeue(xprt);
- if (rqstp->rq_deferred) {
- svc_xprt_received(xprt);
- len = svc_deferred_recv(rqstp);
- } else
+ goto drop_request;
+ }
+
+ if (!test_bit(XPT_LISTENER, &xprt->xpt_flags)) {
+ if (len == 0) {
+ dprintk("svc: server %p, pool %u, transport %p, inuse=%d\n",
+ rqstp, pool->sp_id, xprt,
+ atomic_read(&xprt->xpt_ref.refcount));
len = xprt->xpt_ops->xpo_recvfrom(rqstp);
+
+ /* No data, incomplete (TCP) read, or accept() */
+ if (len == 0 || len == -EAGAIN)
+ goto drop_request;
+ }
+
dprintk("svc: got len=%d\n", len);
- }
- /* No data, incomplete (TCP) read, or accept() */
- if (len == 0 || len == -EAGAIN) {
- rqstp->rq_res.len = 0;
- svc_xprt_release(rqstp);
- return -EAGAIN;
+ clear_bit(XPT_OLD, &xprt->xpt_flags);
+
+ rqstp->rq_secure = svc_port_is_privileged(svc_addr(rqstp));
+ rqstp->rq_chandle.defer = svc_defer;
+
+ if (serv->sv_stats)
+ serv->sv_stats->netcnt++;
+ return len;
}
- clear_bit(XPT_OLD, &xprt->xpt_flags);
- rqstp->rq_secure = svc_port_is_privileged(svc_addr(rqstp));
- rqstp->rq_chandle.defer = svc_defer;
+ newxpt = xprt->xpt_ops->xpo_accept(xprt);
+ if (newxpt) {
+ /*
+ * We know this module_get will succeed because the
+ * listener holds a reference too
+ */
+ __module_get(newxpt->xpt_class->xcl_owner);
+ svc_check_conn_limits(xprt->xpt_server);
+ spin_lock_bh(&serv->sv_lock);
+ set_bit(XPT_TEMP, &newxpt->xpt_flags);
+ list_add(&newxpt->xpt_list, &serv->sv_tempsocks);
+ serv->sv_tmpcnt++;
+ if (serv->sv_temptimer.function == NULL) {
+ /* setup timer to age temp transports */
+ setup_timer(&serv->sv_temptimer,
+ svc_age_temp_xprts,
+ (unsigned long)serv);
+ mod_timer(&serv->sv_temptimer,
+ jiffies + svc_conn_age_period * HZ);
+ }
+ spin_unlock_bh(&serv->sv_lock);
+ svc_xprt_received(newxpt);
+ }
+ svc_xprt_received(xprt);
- if (serv->sv_stats)
- serv->sv_stats->netcnt++;
- return len;
+drop_request:
+ rqstp->rq_res.len = 0;
+ svc_xprt_release(rqstp);
+ return -EAGAIN;
}
EXPORT_SYMBOL(svc_recv);
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-12-01 0:17 ` [PATCH 0/3] " Trond Myklebust
` (2 preceding siblings ...)
2008-12-01 0:20 ` [PATCH 3/3] SUNRPC: svc_xprt_enqueue should not refuse to enqueue 'XPT_DEAD' transports Trond Myklebust
@ 2008-12-01 0:29 ` Trond Myklebust
2008-12-02 15:22 ` Kasparek Tomas
2008-12-01 22:09 ` Ian Campbell
4 siblings, 1 reply; 50+ messages in thread
From: Trond Myklebust @ 2008-12-01 0:29 UTC (permalink / raw)
To: Ian Campbell
Cc: linux-nfs, Max Kellermann, linux-kernel, gcosta, Grant Coady,
J. Bruce Fields, Tom Tucker
On Sun, 2008-11-30 at 19:17 -0500, Trond Myklebust wrote:
> Can you see if the following 3 patches help? They're against 2.6.28-rc6,
> but afaics the problems are pretty much the same on 2.6.26.
>
> Cheers
> Trond
Sorry... I forgot to add that these 3 patches need to be applied to the
nfs server, not the client.
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-12-01 0:17 ` [PATCH 0/3] " Trond Myklebust
` (3 preceding siblings ...)
2008-12-01 0:29 ` [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds" Trond Myklebust
@ 2008-12-01 22:09 ` Ian Campbell
2008-12-06 12:16 ` Ian Campbell
4 siblings, 1 reply; 50+ messages in thread
From: Ian Campbell @ 2008-12-01 22:09 UTC (permalink / raw)
To: Trond Myklebust
Cc: linux-nfs, Max Kellermann, linux-kernel, gcosta, Grant Coady,
J. Bruce Fields, Tom Tucker
[-- Attachment #1: Type: text/plain, Size: 561 bytes --]
On Sun, 2008-11-30 at 19:17 -0500, Trond Myklebust wrote:
> Can you see if the following 3 patches help? They're against 2.6.28-rc6,
> but afaics the problems are pretty much the same on 2.6.26.
Thanks.
The server was actually running 2.6.25.7 but the matching sources have
since been removed the backports.org so I've reproduce with 2.6.26 and
now I'll add the patches.
Ian.
--
Ian Campbell
It has been said that man is a rational animal. All my life I have
been searching for evidence which could support this.
-- Bertrand Russell
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-12-01 0:29 ` [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds" Trond Myklebust
@ 2008-12-02 15:22 ` Kasparek Tomas
2008-12-02 15:37 ` Trond Myklebust
0 siblings, 1 reply; 50+ messages in thread
From: Kasparek Tomas @ 2008-12-02 15:22 UTC (permalink / raw)
To: Trond Myklebust
Cc: Ian Campbell, linux-nfs, Max Kellermann, linux-kernel, gcosta,
Grant Coady, J. Bruce Fields, Tom Tucker
On Sun, Nov 30, 2008 at 07:29:40PM -0500, Trond Myklebust wrote:
> On Sun, 2008-11-30 at 19:17 -0500, Trond Myklebust wrote:
> > Can you see if the following 3 patches help? They're against 2.6.28-rc6,
> > but afaics the problems are pretty much the same on 2.6.26.
>
> Sorry... I forgot to add that these 3 patches need to be applied to the
> nfs server, not the client.
Hi,
I have the problem on client side and can not change server (FreeBSD 7.0).
these patches does not change the situation (and they are probably not
supposed to do so, just giving it a try). After few minutes I got this on
the client with 2.6.28-rc6 with patches:
tcp 0 0 147.229.12.146:674 147.229.176.14:2049 FIN_WAIT2
Applying reverse e06799f958bf7f9f8fae15f0c6f519953fb0257c suggested by Ian
does help on the other side (with 2.6.27.4).
Bye
--
Tomas Kasparek, PhD student E-mail: kasparek@fit.vutbr.cz
CVT FIT VUT Brno, L127 Web: http://www.fit.vutbr.cz/~kasparek
Bozetechova 1, 612 66 Fax: +420 54114-1270
Brno, Czech Republic Phone: +420 54114-1220
jabber: tomas.kasparek@jabber.cz
GPG: 2F1E 1AAF FD3B CFA3 1537 63BD DCBE 18FF A035 53BC
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-12-02 15:22 ` Kasparek Tomas
@ 2008-12-02 15:37 ` Trond Myklebust
2008-12-02 16:26 ` Kasparek Tomas
0 siblings, 1 reply; 50+ messages in thread
From: Trond Myklebust @ 2008-12-02 15:37 UTC (permalink / raw)
To: Kasparek Tomas
Cc: Ian Campbell, linux-nfs, Max Kellermann, linux-kernel, gcosta,
Grant Coady, J. Bruce Fields, Tom Tucker
On Tue, 2008-12-02 at 16:22 +0100, Kasparek Tomas wrote:
> On Sun, Nov 30, 2008 at 07:29:40PM -0500, Trond Myklebust wrote:
> > On Sun, 2008-11-30 at 19:17 -0500, Trond Myklebust wrote:
> > > Can you see if the following 3 patches help? They're against 2.6.28-rc6,
> > > but afaics the problems are pretty much the same on 2.6.26.
> >
> > Sorry... I forgot to add that these 3 patches need to be applied to the
> > nfs server, not the client.
>
> Hi,
>
> I have the problem on client side and can not change server (FreeBSD 7.0).
> these patches does not change the situation (and they are probably not
> supposed to do so, just giving it a try). After few minutes I got this on
> the client with 2.6.28-rc6 with patches:
>
> tcp 0 0 147.229.12.146:674 147.229.176.14:2049 FIN_WAIT2
>
> Applying reverse e06799f958bf7f9f8fae15f0c6f519953fb0257c suggested by Ian
> does help on the other side (with 2.6.27.4).
Then I suggest working around the problem by reducing the value of the
sysctl net.ipv4.tcp_fin_timeout on the client.
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-12-02 15:37 ` Trond Myklebust
@ 2008-12-02 16:26 ` Kasparek Tomas
2008-12-02 18:10 ` Trond Myklebust
0 siblings, 1 reply; 50+ messages in thread
From: Kasparek Tomas @ 2008-12-02 16:26 UTC (permalink / raw)
To: Trond Myklebust
Cc: Ian Campbell, linux-nfs, Max Kellermann, linux-kernel, gcosta,
J. Bruce Fields, Tom Tucker
On Tue, Dec 02, 2008 at 10:37:02AM -0500, Trond Myklebust wrote:
> On Tue, 2008-12-02 at 16:22 +0100, Kasparek Tomas wrote:
> > On Sun, Nov 30, 2008 at 07:29:40PM -0500, Trond Myklebust wrote:
> > > On Sun, 2008-11-30 at 19:17 -0500, Trond Myklebust wrote:
> > > > Can you see if the following 3 patches help? They're against 2.6.28-rc6,
> > > > but afaics the problems are pretty much the same on 2.6.26.
> > >
> > > Sorry... I forgot to add that these 3 patches need to be applied to the
> > > nfs server, not the client.
> >
> > Hi,
> >
> > I have the problem on client side and can not change server (FreeBSD 7.0).
> > these patches does not change the situation (and they are probably not
> > supposed to do so, just giving it a try). After few minutes I got this on
> > the client with 2.6.28-rc6 with patches:
> >
> > tcp 0 0 147.229.12.146:674 147.229.176.14:2049 FIN_WAIT2
> >
> > Applying reverse e06799f958bf7f9f8fae15f0c6f519953fb0257c suggested by Ian
> > does help on the other side (with 2.6.27.4).
>
> Then I suggest working around the problem by reducing the value of the
> sysctl net.ipv4.tcp_fin_timeout on the client.
Did tried. The number should be seconds and defaults to 60, These
connections are still there after several hours. Changing it to 10 (sec)
and same behaviour. (BTW The server did not changed in last several months)
--
Tomas Kasparek, PhD student E-mail: kasparek@fit.vutbr.cz
CVT FIT VUT Brno, L127 Web: http://www.fit.vutbr.cz/~kasparek
Bozetechova 1, 612 66 Fax: +420 54114-1270
Brno, Czech Republic Phone: +420 54114-1220
jabber: tomas.kasparek@jabber.cz
GPG: 2F1E 1AAF FD3B CFA3 1537 63BD DCBE 18FF A035 53BC
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-12-02 16:26 ` Kasparek Tomas
@ 2008-12-02 18:10 ` Trond Myklebust
0 siblings, 0 replies; 50+ messages in thread
From: Trond Myklebust @ 2008-12-02 18:10 UTC (permalink / raw)
To: Kasparek Tomas
Cc: Ian Campbell, linux-nfs, Max Kellermann, linux-kernel, gcosta,
J. Bruce Fields, Tom Tucker
On Tue, 2008-12-02 at 17:26 +0100, Kasparek Tomas wrote:
> Did tried. The number should be seconds and defaults to 60, These
> connections are still there after several hours. Changing it to 10 (sec)
> and same behaviour. (BTW The server did not changed in last several months)
Are you seeing the same behaviour with 'netstat -t'?
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-12-01 22:09 ` Ian Campbell
@ 2008-12-06 12:16 ` Ian Campbell
2008-12-14 18:24 ` Ian Campbell
0 siblings, 1 reply; 50+ messages in thread
From: Ian Campbell @ 2008-12-06 12:16 UTC (permalink / raw)
To: Trond Myklebust
Cc: linux-nfs, Max Kellermann, linux-kernel, gcosta, Grant Coady,
J. Bruce Fields, Tom Tucker
[-- Attachment #1: Type: text/plain, Size: 1063 bytes --]
On Mon, 2008-12-01 at 22:09 +0000, Ian Campbell wrote:
> On Sun, 2008-11-30 at 19:17 -0500, Trond Myklebust wrote:
> > Can you see if the following 3 patches help? They're against 2.6.28-rc6,
> > but afaics the problems are pretty much the same on 2.6.26.
>
> Thanks.
>
> The server was actually running 2.6.25.7 but the matching sources have
> since been removed the backports.org so I've reproduce with 2.6.26 and
> now I'll add the patches.
Just a small progress report. Anecdotally I thought that unpatched
2.6.26.7 was worse than 2.6.25.7, mostly because it hung twice in the ~1
day I was running it where previously it was less frequent than once per
day.
With the patched server the client ran OK for 2.5 days then mysteriously
hung, the logs show none of the normal symptoms and my wife reset it
before I got home so I've no real clue what happened but I'm inclined to
think it was unrelated for now. I'll get back to you in a week or so if
the problem hasn't reoccurred.
Ian.
--
Ian Campbell
It's later than you think.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-12-06 12:16 ` Ian Campbell
@ 2008-12-14 18:24 ` Ian Campbell
2008-12-16 17:55 ` J. Bruce Fields
0 siblings, 1 reply; 50+ messages in thread
From: Ian Campbell @ 2008-12-14 18:24 UTC (permalink / raw)
To: Trond Myklebust
Cc: linux-nfs, Max Kellermann, linux-kernel, gcosta, Grant Coady,
J. Bruce Fields, Tom Tucker
[-- Attachment #1: Type: text/plain, Size: 1440 bytes --]
On Sat, 2008-12-06 at 12:16 +0000, Ian Campbell wrote:
> On Mon, 2008-12-01 at 22:09 +0000, Ian Campbell wrote:
> > On Sun, 2008-11-30 at 19:17 -0500, Trond Myklebust wrote:
> > > Can you see if the following 3 patches help? They're against 2.6.28-rc6,
> > > but afaics the problems are pretty much the same on 2.6.26.
> >
> > Thanks.
> >
> > The server was actually running 2.6.25.7 but the matching sources have
> > since been removed the backports.org so I've reproduce with 2.6.26 and
> > now I'll add the patches.
>
> Just a small progress report. Anecdotally I thought that unpatched
> 2.6.26.7 was worse than 2.6.25.7, mostly because it hung twice in the ~1
> day I was running it where previously it was less frequent than once per
> day.
>
> With the patched server the client ran OK for 2.5 days then mysteriously
> hung, the logs show none of the normal symptoms and my wife reset it
> before I got home so I've no real clue what happened but I'm inclined to
> think it was unrelated for now. I'll get back to you in a week or so if
> the problem hasn't reoccurred.
$ uptime
18:15:29 up 9 days, 22 min, 1 user, load average: 0.74, 0.64, 0.46
This is on the problematic client, so it looks like the server side fix
has sorted it. Thanks very much Trond.
Ian.
--
Ian Campbell
You have only to mumble a few words in church to get married and few words
in your sleep to get divorced.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-12-14 18:24 ` Ian Campbell
@ 2008-12-16 17:55 ` J. Bruce Fields
2008-12-16 18:39 ` Ian Campbell
0 siblings, 1 reply; 50+ messages in thread
From: J. Bruce Fields @ 2008-12-16 17:55 UTC (permalink / raw)
To: Ian Campbell
Cc: Trond Myklebust, linux-nfs, Max Kellermann, linux-kernel, gcosta,
Grant Coady, Tom Tucker
On Sun, Dec 14, 2008 at 06:24:05PM +0000, Ian Campbell wrote:
> On Sat, 2008-12-06 at 12:16 +0000, Ian Campbell wrote:
> > On Mon, 2008-12-01 at 22:09 +0000, Ian Campbell wrote:
> > > On Sun, 2008-11-30 at 19:17 -0500, Trond Myklebust wrote:
> > > > Can you see if the following 3 patches help? They're against 2.6.28-rc6,
> > > > but afaics the problems are pretty much the same on 2.6.26.
> > >
> > > Thanks.
> > >
> > > The server was actually running 2.6.25.7 but the matching sources have
> > > since been removed the backports.org so I've reproduce with 2.6.26 and
> > > now I'll add the patches.
> >
> > Just a small progress report. Anecdotally I thought that unpatched
> > 2.6.26.7 was worse than 2.6.25.7, mostly because it hung twice in the ~1
> > day I was running it where previously it was less frequent than once per
> > day.
> >
> > With the patched server the client ran OK for 2.5 days then mysteriously
> > hung, the logs show none of the normal symptoms and my wife reset it
> > before I got home so I've no real clue what happened but I'm inclined to
> > think it was unrelated for now. I'll get back to you in a week or so if
> > the problem hasn't reoccurred.
>
> $ uptime
> 18:15:29 up 9 days, 22 min, 1 user, load average: 0.74, 0.64, 0.46
>
> This is on the problematic client, so it looks like the server side fix
> has sorted it. Thanks very much Trond.
Thanks for the testing! So this was with the following three patches
applied on the server on top of 2.6.26?
[PATCH 1/3] SUNRPC: Ensure the server closes sockets in a timely fashion
[PATCH 2/3] SUNRPC: We only need to call svc_delete_xprt() once...
[PATCH 3/3] SUNRPC: svc_xprt_enqueue should not refuse to enqueue 'XPT_DEAD' transports
I'll try to take a look at these before I leave for the holidays,
assuming the versions Trond posted on Nov. 30 are the latest.
--b.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-12-16 17:55 ` J. Bruce Fields
@ 2008-12-16 18:39 ` Ian Campbell
2009-01-07 22:21 ` J. Bruce Fields
0 siblings, 1 reply; 50+ messages in thread
From: Ian Campbell @ 2008-12-16 18:39 UTC (permalink / raw)
To: J. Bruce Fields
Cc: Trond Myklebust, linux-nfs, Max Kellermann, linux-kernel, gcosta,
Grant Coady, Tom Tucker
[-- Attachment #1: Type: text/plain, Size: 2195 bytes --]
On Tue, 2008-12-16 at 12:55 -0500, J. Bruce Fields wrote:
> On Sun, Dec 14, 2008 at 06:24:05PM +0000, Ian Campbell wrote:
> > On Sat, 2008-12-06 at 12:16 +0000, Ian Campbell wrote:
> > > On Mon, 2008-12-01 at 22:09 +0000, Ian Campbell wrote:
> > > > On Sun, 2008-11-30 at 19:17 -0500, Trond Myklebust wrote:
> > > > > Can you see if the following 3 patches help? They're against 2.6.28-rc6,
> > > > > but afaics the problems are pretty much the same on 2.6.26.
> > > >
> > > > Thanks.
> > > >
> > > > The server was actually running 2.6.25.7 but the matching sources have
> > > > since been removed the backports.org so I've reproduce with 2.6.26 and
> > > > now I'll add the patches.
> > >
> > > Just a small progress report. Anecdotally I thought that unpatched
> > > 2.6.26.7 was worse than 2.6.25.7, mostly because it hung twice in the ~1
> > > day I was running it where previously it was less frequent than once per
> > > day.
> > >
> > > With the patched server the client ran OK for 2.5 days then mysteriously
> > > hung, the logs show none of the normal symptoms and my wife reset it
> > > before I got home so I've no real clue what happened but I'm inclined to
> > > think it was unrelated for now. I'll get back to you in a week or so if
> > > the problem hasn't reoccurred.
> >
> > $ uptime
> > 18:15:29 up 9 days, 22 min, 1 user, load average: 0.74, 0.64, 0.46
> >
> > This is on the problematic client, so it looks like the server side fix
> > has sorted it. Thanks very much Trond.
>
> Thanks for the testing! So this was with the following three patches
> applied on the server on top of 2.6.26?
>
> [PATCH 1/3] SUNRPC: Ensure the server closes sockets in a timely fashion
> [PATCH 2/3] SUNRPC: We only need to call svc_delete_xprt() once...
> [PATCH 3/3] SUNRPC: svc_xprt_enqueue should not refuse to enqueue 'XPT_DEAD' transports
That's right, it was actually 2.6.26.7 FWIW.
> I'll try to take a look at these before I leave for the holidays,
> assuming the versions Trond posted on Nov. 30 are the latest.
Thanks.
Ian.
--
Ian Campbell
The light of a hundred stars does not equal the light of the moon.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/3] SUNRPC: Ensure the server closes sockets in a timely fashion
2008-12-01 0:18 ` [PATCH 1/3] SUNRPC: Ensure the server closes sockets in a timely fashion Trond Myklebust
@ 2008-12-17 15:27 ` Tom Tucker
2008-12-17 18:08 ` Trond Myklebust
0 siblings, 1 reply; 50+ messages in thread
From: Tom Tucker @ 2008-12-17 15:27 UTC (permalink / raw)
To: Trond Myklebust
Cc: Ian Campbell, linux-nfs, Max Kellermann, linux-kernel, gcosta,
Grant Coady, J. Bruce Fields
Trond Myklebust wrote:
> We want to ensure that connected sockets close down the connection when we
> set XPT_CLOSE, so that we don't keep it hanging while cleaning up all the
> stuff that is keeping a reference to the socket.
>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>
> net/sunrpc/svcsock.c | 21 ++++++++++++++++++++-
> 1 files changed, 20 insertions(+), 1 deletions(-)
>
>
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 95293f5..a1b048d 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -59,6 +59,7 @@ static void svc_udp_data_ready(struct sock *, int);
> static int svc_udp_recvfrom(struct svc_rqst *);
> static int svc_udp_sendto(struct svc_rqst *);
> static void svc_sock_detach(struct svc_xprt *);
> +static void svc_tcp_sock_detach(struct svc_xprt *);
> static void svc_sock_free(struct svc_xprt *);
>
> static struct svc_xprt *svc_create_socket(struct svc_serv *, int,
> @@ -1017,7 +1018,7 @@ static struct svc_xprt_ops svc_tcp_ops = {
> .xpo_recvfrom = svc_tcp_recvfrom,
> .xpo_sendto = svc_tcp_sendto,
> .xpo_release_rqst = svc_release_skb,
> - .xpo_detach = svc_sock_detach,
> + .xpo_detach = svc_tcp_sock_detach,
> .xpo_free = svc_sock_free,
> .xpo_prep_reply_hdr = svc_tcp_prep_reply_hdr,
> .xpo_has_wspace = svc_tcp_has_wspace,
> @@ -1282,6 +1283,24 @@ static void svc_sock_detach(struct svc_xprt *xprt)
> sk->sk_state_change = svsk->sk_ostate;
> sk->sk_data_ready = svsk->sk_odata;
> sk->sk_write_space = svsk->sk_owspace;
> +
> + if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
> + wake_up_interruptible(sk->sk_sleep);
> +}
> +
> +/*
> + * Disconnect the socket, and reset the callbacks
> + */
> +static void svc_tcp_sock_detach(struct svc_xprt *xprt)
> +{
> + struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
> +
> + dprintk("svc: svc_tcp_sock_detach(%p)\n", svsk);
> +
> + svc_sock_detach(xprt);
> +
> + if (!test_bit(XPT_LISTENER, &xprt->xpt_flags))
> + kernel_sock_shutdown(svsk->sk_sock, SHUT_RDWR);
How is this different than what happens as an artifact of sock_release?
> }
>
> /*
>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 3/3] SUNRPC: svc_xprt_enqueue should not refuse to enqueue 'XPT_DEAD' transports
2008-12-01 0:20 ` [PATCH 3/3] SUNRPC: svc_xprt_enqueue should not refuse to enqueue 'XPT_DEAD' transports Trond Myklebust
@ 2008-12-17 15:35 ` Tom Tucker
2008-12-17 19:07 ` Trond Myklebust
0 siblings, 1 reply; 50+ messages in thread
From: Tom Tucker @ 2008-12-17 15:35 UTC (permalink / raw)
To: Trond Myklebust
Cc: Ian Campbell, linux-nfs, Max Kellermann, linux-kernel, gcosta,
Grant Coady, J. Bruce Fields
Trond Myklebust wrote:
> Aside from being racy (there is nothing preventing someone setting XPT_DEAD
> after the test in svc_xprt_enqueue, and before XPT_BUSY is set), it is
> wrong to assume that transports which have called svc_delete_xprt() might
> not need to be re-enqueued.
This is only true because now you allow transports with XPT_DEAD set to
be enqueued -- yes?
>
> See the list of deferred requests, which is currently never going to
> be cleared if the revisit call happens after svc_delete_xprt(). In this
> case, the deferred request will currently keep a reference to the transport
> forever.
>
I agree this is a possibility and it needs to be fixed. I'm concerned
that the root cause is still there though. I thought the test case was
the client side timing out the connection. Why are there deferred
requests sitting on what is presumably an idle connection?
> The fix should be to allow dead transports to be enqueued in order to clear
> the deferred requests, then change the order of processing in svc_recv() so
> that we pick up deferred requests before we do the XPT_CLOSE processing.
>
Wouldn't it be simpler to clean up any deferred requests in the close
path instead of changing the meaning of XPT_DEAD and dispatching
N-threads to do the same?
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>
> net/sunrpc/svc_xprt.c | 124 +++++++++++++++++++++++++++----------------------
> 1 files changed, 69 insertions(+), 55 deletions(-)
>
>
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index a417064..b54cf84 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -297,10 +297,15 @@ void svc_xprt_enqueue(struct svc_xprt *xprt)
> struct svc_serv *serv = xprt->xpt_server;
> struct svc_pool *pool;
> struct svc_rqst *rqstp;
> + unsigned long flags;
> int cpu;
>
> - if (!(xprt->xpt_flags &
> - ((1<<XPT_CONN)|(1<<XPT_DATA)|(1<<XPT_CLOSE)|(1<<XPT_DEFERRED))))
> + flags = xprt->xpt_flags &
> + (1UL<<XPT_CONN | 1UL<<XPT_DATA | 1UL<<XPT_CLOSE |
> + 1UL<<XPT_DEAD | 1UL<<XPT_DEFERRED);
> + if (flags == 0)
> + return;
> + if ((flags & 1UL<<XPT_DEAD) != 0 && (flags & 1UL<<XPT_DEFERRED) == 0)
> return;
>
> cpu = get_cpu();
> @@ -315,12 +320,6 @@ void svc_xprt_enqueue(struct svc_xprt *xprt)
> "svc_xprt_enqueue: "
> "threads and transports both waiting??\n");
>
> - if (test_bit(XPT_DEAD, &xprt->xpt_flags)) {
> - /* Don't enqueue dead transports */
> - dprintk("svc: transport %p is dead, not enqueued\n", xprt);
> - goto out_unlock;
> - }
> -
> /* Mark transport as busy. It will remain in this state until
> * the provider calls svc_xprt_received. We update XPT_BUSY
> * atomically because it also guards against trying to enqueue
> @@ -566,6 +565,7 @@ static void svc_check_conn_limits(struct svc_serv *serv)
> int svc_recv(struct svc_rqst *rqstp, long timeout)
> {
> struct svc_xprt *xprt = NULL;
> + struct svc_xprt *newxpt;
> struct svc_serv *serv = rqstp->rq_server;
> struct svc_pool *pool = rqstp->rq_pool;
> int len, i;
> @@ -673,62 +673,76 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
> spin_unlock_bh(&pool->sp_lock);
>
> len = 0;
> +
> + /*
> + * Deal with deferred requests first, since they need to be
> + * dequeued and dropped if the transport has been closed.
> + */
> + rqstp->rq_deferred = svc_deferred_dequeue(xprt);
> + if (rqstp->rq_deferred) {
> + svc_xprt_received(xprt);
> + len = svc_deferred_recv(rqstp);
> + }
> +
> if (test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
> dprintk("svc_recv: found XPT_CLOSE\n");
> svc_delete_xprt(xprt);
> - } else if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) {
> - struct svc_xprt *newxpt;
> - newxpt = xprt->xpt_ops->xpo_accept(xprt);
> - if (newxpt) {
> - /*
> - * We know this module_get will succeed because the
> - * listener holds a reference too
> - */
> - __module_get(newxpt->xpt_class->xcl_owner);
> - svc_check_conn_limits(xprt->xpt_server);
> - spin_lock_bh(&serv->sv_lock);
> - set_bit(XPT_TEMP, &newxpt->xpt_flags);
> - list_add(&newxpt->xpt_list, &serv->sv_tempsocks);
> - serv->sv_tmpcnt++;
> - if (serv->sv_temptimer.function == NULL) {
> - /* setup timer to age temp transports */
> - setup_timer(&serv->sv_temptimer,
> - svc_age_temp_xprts,
> - (unsigned long)serv);
> - mod_timer(&serv->sv_temptimer,
> - jiffies + svc_conn_age_period * HZ);
> - }
> - spin_unlock_bh(&serv->sv_lock);
> - svc_xprt_received(newxpt);
> - }
> - svc_xprt_received(xprt);
> - } else {
> - dprintk("svc: server %p, pool %u, transport %p, inuse=%d\n",
> - rqstp, pool->sp_id, xprt,
> - atomic_read(&xprt->xpt_ref.refcount));
> - rqstp->rq_deferred = svc_deferred_dequeue(xprt);
> - if (rqstp->rq_deferred) {
> - svc_xprt_received(xprt);
> - len = svc_deferred_recv(rqstp);
> - } else
> + goto drop_request;
> + }
> +
> + if (!test_bit(XPT_LISTENER, &xprt->xpt_flags)) {
> + if (len == 0) {
> + dprintk("svc: server %p, pool %u, transport %p, inuse=%d\n",
> + rqstp, pool->sp_id, xprt,
> + atomic_read(&xprt->xpt_ref.refcount));
> len = xprt->xpt_ops->xpo_recvfrom(rqstp);
> +
> + /* No data, incomplete (TCP) read, or accept() */
> + if (len == 0 || len == -EAGAIN)
> + goto drop_request;
> + }
> +
> dprintk("svc: got len=%d\n", len);
> - }
>
> - /* No data, incomplete (TCP) read, or accept() */
> - if (len == 0 || len == -EAGAIN) {
> - rqstp->rq_res.len = 0;
> - svc_xprt_release(rqstp);
> - return -EAGAIN;
> + clear_bit(XPT_OLD, &xprt->xpt_flags);
> +
> + rqstp->rq_secure = svc_port_is_privileged(svc_addr(rqstp));
> + rqstp->rq_chandle.defer = svc_defer;
> +
> + if (serv->sv_stats)
> + serv->sv_stats->netcnt++;
> + return len;
> }
> - clear_bit(XPT_OLD, &xprt->xpt_flags);
>
> - rqstp->rq_secure = svc_port_is_privileged(svc_addr(rqstp));
> - rqstp->rq_chandle.defer = svc_defer;
> + newxpt = xprt->xpt_ops->xpo_accept(xprt);
> + if (newxpt) {
> + /*
> + * We know this module_get will succeed because the
> + * listener holds a reference too
> + */
> + __module_get(newxpt->xpt_class->xcl_owner);
> + svc_check_conn_limits(xprt->xpt_server);
> + spin_lock_bh(&serv->sv_lock);
> + set_bit(XPT_TEMP, &newxpt->xpt_flags);
> + list_add(&newxpt->xpt_list, &serv->sv_tempsocks);
> + serv->sv_tmpcnt++;
> + if (serv->sv_temptimer.function == NULL) {
> + /* setup timer to age temp transports */
> + setup_timer(&serv->sv_temptimer,
> + svc_age_temp_xprts,
> + (unsigned long)serv);
> + mod_timer(&serv->sv_temptimer,
> + jiffies + svc_conn_age_period * HZ);
> + }
> + spin_unlock_bh(&serv->sv_lock);
> + svc_xprt_received(newxpt);
> + }
> + svc_xprt_received(xprt);
>
> - if (serv->sv_stats)
> - serv->sv_stats->netcnt++;
> - return len;
> +drop_request:
> + rqstp->rq_res.len = 0;
> + svc_xprt_release(rqstp);
> + return -EAGAIN;
> }
> EXPORT_SYMBOL(svc_recv);
>
>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/3] SUNRPC: Ensure the server closes sockets in a timely fashion
2008-12-17 15:27 ` Tom Tucker
@ 2008-12-17 18:08 ` Trond Myklebust
2008-12-17 18:59 ` Tom Tucker
0 siblings, 1 reply; 50+ messages in thread
From: Trond Myklebust @ 2008-12-17 18:08 UTC (permalink / raw)
To: Tom Tucker
Cc: Ian Campbell, linux-nfs, Max Kellermann, linux-kernel, gcosta,
Grant Coady, J. Bruce Fields
On Wed, 2008-12-17 at 09:27 -0600, Tom Tucker wrote:
> > + if (!test_bit(XPT_LISTENER, &xprt->xpt_flags))
> > + kernel_sock_shutdown(svsk->sk_sock, SHUT_RDWR);
>
> How is this different than what happens as an artifact of sock_release?
The point is that it is independent of whether or not something is
holding a reference to the svc_sock.
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/3] SUNRPC: Ensure the server closes sockets in a timely fashion
2008-12-17 18:08 ` Trond Myklebust
@ 2008-12-17 18:59 ` Tom Tucker
0 siblings, 0 replies; 50+ messages in thread
From: Tom Tucker @ 2008-12-17 18:59 UTC (permalink / raw)
To: Trond Myklebust
Cc: Ian Campbell, linux-nfs, Max Kellermann, linux-kernel, gcosta,
Grant Coady, J. Bruce Fields
Trond Myklebust wrote:
> On Wed, 2008-12-17 at 09:27 -0600, Tom Tucker wrote:
>>> + if (!test_bit(XPT_LISTENER, &xprt->xpt_flags))
>>> + kernel_sock_shutdown(svsk->sk_sock, SHUT_RDWR);
>> How is this different than what happens as an artifact of sock_release?
>
> The point is that it is independent of whether or not something is
> holding a reference to the svc_sock.
Thanks, makes sense.
>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 3/3] SUNRPC: svc_xprt_enqueue should not refuse to enqueue 'XPT_DEAD' transports
2008-12-17 15:35 ` Tom Tucker
@ 2008-12-17 19:07 ` Trond Myklebust
2008-12-23 14:49 ` Tom Tucker
0 siblings, 1 reply; 50+ messages in thread
From: Trond Myklebust @ 2008-12-17 19:07 UTC (permalink / raw)
To: Tom Tucker
Cc: Ian Campbell, linux-nfs, Max Kellermann, linux-kernel, gcosta,
Grant Coady, J. Bruce Fields
On Wed, 2008-12-17 at 09:35 -0600, Tom Tucker wrote:
> Trond Myklebust wrote:
> > Aside from being racy (there is nothing preventing someone setting XPT_DEAD
> > after the test in svc_xprt_enqueue, and before XPT_BUSY is set), it is
> > wrong to assume that transports which have called svc_delete_xprt() might
> > not need to be re-enqueued.
>
> This is only true because now you allow transports with XPT_DEAD set to
> be enqueued -- yes?
>
> >
> > See the list of deferred requests, which is currently never going to
> > be cleared if the revisit call happens after svc_delete_xprt(). In this
> > case, the deferred request will currently keep a reference to the transport
> > forever.
> >
>
> I agree this is a possibility and it needs to be fixed. I'm concerned
> that the root cause is still there though. I thought the test case was
> the client side timing out the connection. Why are there deferred
> requests sitting on what is presumably an idle connection?
I haven't said that they are the cause of this test case. I've said that
deferred requests hold references to the socket that can obviously
deadlock. That needs to be fixed regardless of whether or not it is the
cause here.
There are plenty of situations in which the client may choose to close
the TCP socket even if there are outstanding requests. One of the most
common is when the user signals the process, so that an RPC call that
was partially transmitted (ran out of buffer space) gets cancelled
before it can finish transmitting. In that case the client has no choice
but to disconnect and immediately reconnect.
> > The fix should be to allow dead transports to be enqueued in order to clear
> > the deferred requests, then change the order of processing in svc_recv() so
> > that we pick up deferred requests before we do the XPT_CLOSE processing.
> >
>
> Wouldn't it be simpler to clean up any deferred requests in the close
> path instead of changing the meaning of XPT_DEAD and dispatching
> N-threads to do the same?
AFAICS, deferred requests are the property of the cache until they
expire or a downcall occurs. I'm not aware of any way to cancel only
those deferred requests that hold a reference to this particular
transport.
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 3/3] SUNRPC: svc_xprt_enqueue should not refuse to enqueue 'XPT_DEAD' transports
2008-12-17 19:07 ` Trond Myklebust
@ 2008-12-23 14:49 ` Tom Tucker
2008-12-23 23:39 ` Tom Tucker
0 siblings, 1 reply; 50+ messages in thread
From: Tom Tucker @ 2008-12-23 14:49 UTC (permalink / raw)
To: Trond Myklebust
Cc: Ian Campbell, linux-nfs, Max Kellermann, linux-kernel, gcosta,
Grant Coady, J. Bruce Fields
Trond Myklebust wrote:
> On Wed, 2008-12-17 at 09:35 -0600, Tom Tucker wrote:
>
>> Trond Myklebust wrote:
>>
>>> Aside from being racy (there is nothing preventing someone setting XPT_DEAD
>>> after the test in svc_xprt_enqueue, and before XPT_BUSY is set), it is
>>> wrong to assume that transports which have called svc_delete_xprt() might
>>> not need to be re-enqueued.
>>>
>> This is only true because now you allow transports with XPT_DEAD set to
>> be enqueued -- yes?
>>
>>
>>> See the list of deferred requests, which is currently never going to
>>> be cleared if the revisit call happens after svc_delete_xprt(). In this
>>> case, the deferred request will currently keep a reference to the transport
>>> forever.
>>>
>>>
>> I agree this is a possibility and it needs to be fixed. I'm concerned
>> that the root cause is still there though. I thought the test case was
>> the client side timing out the connection. Why are there deferred
>> requests sitting on what is presumably an idle connection?
>>
>
> I haven't said that they are the cause of this test case. I've said that
> deferred requests hold references to the socket that can obviously
> deadlock. That needs to be fixed regardless of whether or not it is the
> cause here.
>
> There are plenty of situations in which the client may choose to close
> the TCP socket even if there are outstanding requests. One of the most
> common is when the user signals the process, so that an RPC call that
> was partially transmitted (ran out of buffer space) gets cancelled
> before it can finish transmitting. In that case the client has no choice
> but to disconnect and immediately reconnect.
>
>
>>> The fix should be to allow dead transports to be enqueued in order to clear
>>> the deferred requests, then change the order of processing in svc_recv() so
>>> that we pick up deferred requests before we do the XPT_CLOSE processing.
>>>
>>>
>> Wouldn't it be simpler to clean up any deferred requests in the close
>> path instead of changing the meaning of XPT_DEAD and dispatching
>> N-threads to do the same?
>>
>
> AFAICS, deferred requests are the property of the cache until they
> expire or a downcall occurs. I'm not aware of any way to cancel only
> those deferred requests that hold a reference to this particular
> transport.
>
>
Ok, I think you're right, and I think that this fix is correct and makes
the symptom go away.
I may be completely confused here, but:
- The deferred requests should be getting cleaned up by timing out, and
that does not not seem to be happening, (Is this true?)
- By releasing the underlying connection prior to releasing the
transport that manages it, we've converted the visible resource leek to
an invisible one.
- This has been around forever and changing the client side close
behavior graceful exposed this bug,
So I'm wondering if what we want to do here is to provide a mechanism
for canceling deferred requests for a particular transport. This would
provide a mechanism for the generic transport driver to force
cancellation of deferred requests when closing.
Tom
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 3/3] SUNRPC: svc_xprt_enqueue should not refuse to enqueue 'XPT_DEAD' transports
2008-12-23 14:49 ` Tom Tucker
@ 2008-12-23 23:39 ` Tom Tucker
0 siblings, 0 replies; 50+ messages in thread
From: Tom Tucker @ 2008-12-23 23:39 UTC (permalink / raw)
To: Trond Myklebust
Cc: Ian Campbell, linux-nfs, Max Kellermann, linux-kernel, gcosta,
Grant Coady, J. Bruce Fields
Tom Tucker wrote:
> Trond Myklebust wrote:
>> On Wed, 2008-12-17 at 09:35 -0600, Tom Tucker wrote:
>>
>>> Trond Myklebust wrote:
>>>
>>>> Aside from being racy (there is nothing preventing someone setting
>>>> XPT_DEAD
>>>> after the test in svc_xprt_enqueue, and before XPT_BUSY is set), it is
>>>> wrong to assume that transports which have called svc_delete_xprt()
>>>> might
>>>> not need to be re-enqueued.
>>>>
>>> This is only true because now you allow transports with XPT_DEAD set
>>> to be enqueued -- yes?
>>>
>>>
>>>> See the list of deferred requests, which is currently never going to
>>>> be cleared if the revisit call happens after svc_delete_xprt(). In
>>>> this
>>>> case, the deferred request will currently keep a reference to the
>>>> transport
>>>> forever.
>>>>
>>>>
>>> I agree this is a possibility and it needs to be fixed. I'm
>>> concerned that the root cause is still there though. I thought the
>>> test case was the client side timing out the connection. Why are
>>> there deferred requests sitting on what is presumably an idle
>>> connection?
>>>
>>
>> I haven't said that they are the cause of this test case. I've said that
>> deferred requests hold references to the socket that can obviously
>> deadlock. That needs to be fixed regardless of whether or not it is the
>> cause here.
>>
>> There are plenty of situations in which the client may choose to close
>> the TCP socket even if there are outstanding requests. One of the most
>> common is when the user signals the process, so that an RPC call that
>> was partially transmitted (ran out of buffer space) gets cancelled
>> before it can finish transmitting. In that case the client has no choice
>> but to disconnect and immediately reconnect.
>>
>>
>>>> The fix should be to allow dead transports to be enqueued in order
>>>> to clear
>>>> the deferred requests, then change the order of processing in
>>>> svc_recv() so
>>>> that we pick up deferred requests before we do the XPT_CLOSE
>>>> processing.
>>>>
>>>>
>>> Wouldn't it be simpler to clean up any deferred requests in the
>>> close path instead of changing the meaning of XPT_DEAD and
>>> dispatching N-threads to do the same?
>>>
>>
>> AFAICS, deferred requests are the property of the cache until they
>> expire or a downcall occurs. I'm not aware of any way to cancel only
>> those deferred requests that hold a reference to this particular
>> transport.
>>
>>
> Ok, I think you're right, and I think that this fix is correct and
> makes the symptom go away.
>
> I may be completely confused here, but:
>
> - The deferred requests should be getting cleaned up by timing out,
> and that does not not seem to be happening, (Is this true?)
>
They are getting "cleaned up", but by the time they do the transport is
dead, the request has been added to the deferred queue, but it won't get
processed because svc_xprt_enqueue won't "schedule" a dead transport.
> - By releasing the underlying connection prior to releasing the
> transport that manages it, we've converted the visible resource leek
> to an invisible one.
>
Not with your changes per the above.
> - This has been around forever and changing the client side close
> behavior graceful exposed this bug,
>
> So I'm wondering if what we want to do here is to provide a mechanism
> for canceling deferred requests for a particular transport. This would
> provide a mechanism for the generic transport driver to force
> cancellation of deferred requests when closing.
This is a new interface and we'd still need to handle requests sitting
on the transport's deferred queue. Probably not a good idea.
> Tom
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-12-16 18:39 ` Ian Campbell
@ 2009-01-07 22:21 ` J. Bruce Fields
2009-01-08 18:20 ` J. Bruce Fields
0 siblings, 1 reply; 50+ messages in thread
From: J. Bruce Fields @ 2009-01-07 22:21 UTC (permalink / raw)
To: Ian Campbell
Cc: Trond Myklebust, linux-nfs, Max Kellermann, linux-kernel, gcosta,
Grant Coady, Tom Tucker
On Tue, Dec 16, 2008 at 06:39:35PM +0000, Ian Campbell wrote:
> That's right, it was actually 2.6.26.7 FWIW.
>
> > I'll try to take a look at these before I leave for the holidays,
> > assuming the versions Trond posted on Nov. 30 are the latest.
>
> Thanks.
Sorry for getting behind.
If you got a chance to retest with the for-2.6.29 branch at
git://linux-nfs.org/~bfields/linux.git for-2.6.29
that'd be great; that's what I intend to send to Linus.
--b.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2009-01-07 22:21 ` J. Bruce Fields
@ 2009-01-08 18:20 ` J. Bruce Fields
2009-01-08 21:22 ` Ian Campbell
0 siblings, 1 reply; 50+ messages in thread
From: J. Bruce Fields @ 2009-01-08 18:20 UTC (permalink / raw)
To: Ian Campbell
Cc: Trond Myklebust, linux-nfs, Max Kellermann, linux-kernel, gcosta,
Grant Coady, Tom Tucker
On Wed, Jan 07, 2009 at 05:21:15PM -0500, J. Bruce Fields wrote:
> On Tue, Dec 16, 2008 at 06:39:35PM +0000, Ian Campbell wrote:
> > That's right, it was actually 2.6.26.7 FWIW.
> >
> > > I'll try to take a look at these before I leave for the holidays,
> > > assuming the versions Trond posted on Nov. 30 are the latest.
> >
> > Thanks.
>
> Sorry for getting behind.
>
> If you got a chance to retest with the for-2.6.29 branch at
>
> git://linux-nfs.org/~bfields/linux.git for-2.6.29
>
> that'd be great; that's what I intend to send to Linus.
(Merged now, so testing mainline as of today should work too.)
--b.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2009-01-08 18:20 ` J. Bruce Fields
@ 2009-01-08 21:22 ` Ian Campbell
2009-01-08 21:26 ` J. Bruce Fields
0 siblings, 1 reply; 50+ messages in thread
From: Ian Campbell @ 2009-01-08 21:22 UTC (permalink / raw)
To: J. Bruce Fields
Cc: Trond Myklebust, linux-nfs, Max Kellermann, linux-kernel, gcosta,
Grant Coady, Tom Tucker
[-- Attachment #1: Type: text/plain, Size: 7657 bytes --]
On Thu, 2009-01-08 at 13:20 -0500, J. Bruce Fields wrote:
> On Wed, Jan 07, 2009 at 05:21:15PM -0500, J. Bruce Fields wrote:
> > On Tue, Dec 16, 2008 at 06:39:35PM +0000, Ian Campbell wrote:
> > > That's right, it was actually 2.6.26.7 FWIW.
> > >
> > > > I'll try to take a look at these before I leave for the holidays,
> > > > assuming the versions Trond posted on Nov. 30 are the latest.
> > >
> > > Thanks.
> >
> > Sorry for getting behind.
> >
> > If you got a chance to retest with the for-2.6.29 branch at
> >
> > git://linux-nfs.org/~bfields/linux.git for-2.6.29
> >
> > that'd be great; that's what I intend to send to Linus.
>
> (Merged now, so testing mainline as of today should work too.)
The server isn't really a machine I want to test random kernels on, is
there some subset of those changesets which it would be useful for me to
pull back onto the 2.6.26 kernel I'm using to test? (I can most like
manage the backporting myself).
These two look like the relevant ones to me but I'm not sure:
22945e4a1c7454c97f5d8aee1ef526c83fef3223 svc: Clean up deferred requests on transport destruction
69b6ba3712b796a66595cfaf0a5ab4dfe1cf964a SUNRPC: Ensure the server closes sockets in a timely fashion
I think 69b6 was in the set of three I tested previously and the other
two turned into 2294?
Ian.
Full list 238c6d54830c624f34ac9cf123ac04aebfca5013..nfs-bfields/for-2.6.29:
db43910cb42285a99f45f7e0a0a32e32d0b61dcf nfsd: get rid of NFSD_VERSION
87df4de8073f922a1f643b9fa6ba0412d5529ecf nfsd: last_byte_offset
4e65ebf08951326709817e654c149d0a94982e01 nfsd: delete wrong file comment from nfsd/nfs4xdr.c
df96fcf02a5fd2ae4e9b09e079dd6ef12d10ecd7 nfsd: git rid of nfs4_cb_null_ops declaration
0407717d8587f60003f4904bff27650cd836c00c nfsd: dprint each op status in nfsd4_proc_compound
b7aeda40d3010666d2c024c80557b6aa92a1a1ad nfsd: add etoosmall to nfserrno
30fa8c0157e4591ee2227aaa0b17cd3b0da5e6cb NFSD: FIDs need to take precedence over UUIDs
24c3767e41a6a59d32bb45abe899eb194e6bf1b8 SUNRPC: The sunrpc server code should not be used by out-of-tree modules
22945e4a1c7454c97f5d8aee1ef526c83fef3223 svc: Clean up deferred requests on transport destruction
9a8d248e2d2e9c880ac4561f27fea5dc200655bd nfsd: fix double-locks of directory mutex
2779e3ae39645515cb6c1126634f47c28c9e7190 svc: Move kfree of deferral record to common code
f05ef8db1abe68e3f6fc272efee51bc54ce528c5 CRED: Fix NFSD regression
0dba7c2a9ed3d4a1e58f5d94fffa9f44dbe012e6 NLM: Clean up flow of control in make_socks() function
d3fe5ea7cf815c037c90b1f1464ffc1ab5e8601b NLM: Refactor make_socks() function
55ef1274dddd4de387c54d110e354ffbb6cdc706 nfsd: Ensure nfsv4 calls the underlying filesystem on LOCKT
69b6ba3712b796a66595cfaf0a5ab4dfe1cf964a SUNRPC: Ensure the server closes sockets in a timely fashion
262a09823bb07c6aafb6c1d312cde613d0b90c85 NFSD: Add documenting comments for nfsctl interface
9e074856caf13ba83363f73759f5e395f74ccf41 NFSD: Replace open-coded integer with macro
54224f04ae95d86b27c0673cd773ebb120d86876 NFSD: Fix a handful of coding style issues in write_filehandle()
b046ccdc1f8171f6d0129dcc2a28d49187b4bf69 NFSD: clean up failover sysctl function naming
b064ec038a6180b13e5f89b6a30b42cb5ce8febc lockd: Enable NLM use of AF_INET6
57ef692588bc225853ca3267ca5b7cea2b07e058 NLM: Rewrite IPv4 privileged requester's check
d1208f70738c91f13b4eadb1b7a694082e439da2 NLM: nlm_privileged_requester() doesn't recognize mapped loopback address
49b5699b3fc22b363534c509c1b7dba06bc677bf NSM: Move nsm_create()
b7ba597fb964dfa44284904b3b3d74d44b8e1c42 NSM: Move nsm_use_hostnames to mon.c
8529bc51d30b8f001734b29b21a51b579c260f5b NSM: Move nsm_addr() to fs/lockd/mon.c
e6765b83977f07983c7a10e6bbb19d6c7bbfc3a4 NSM: Remove include/linux/lockd/sm_inter.h
94da7663db26530a8377f7219f8be8bd4d4822c2 NSM: Replace IP address as our nlm_reboot lookup key
77a3ef33e2de6fc8aabd7cb1700bfef81757c28a NSM: More clean up of nsm_get_handle()
b39b897c259fc1fd1998505f2b1d4ec1f115bce1 NSM: Refactor nsm_handle creation into a helper function
92fd91b998a5216a6d6606704e71d541a180216c NLM: Remove "create" argument from nsm_find()
8c7378fd2a5f22016542931b887a2ae98d146eaf NLM: Call nsm_reboot_lookup() instead of nsm_find()
3420a8c4359a189f7d854ed7075d151257415447 NSM: Add nsm_lookup() function
576df4634e37e46b441fefb91915184edb13bb94 NLM: Decode "priv" argument of NLMPROC_SM_NOTIFY as an opaque
7fefc9cb9d5f129c238d93166f705c96ca2e7e51 NLM: Change nlm_host_rebooted() to take a single nlm_reboot argument
cab2d3c99165abbba2943f1b269003b17fd3b1cb NSM: Encode the new "priv" cookie for NSMPROC_MON requests
7e44d3bea21fbb9494930d1cd35ca92a9a4a3279 NSM: Generate NSMPROC_MON's "priv" argument when nsm_handle is created
05f3a9af58180d24a9decedd71d4587935782d70 NSM: Remove !nsm check from nsm_release()
bc1cc6c4e476b60df48227165990c87a22db6bb7 NSM: Remove NULL pointer check from nsm_find()
5cf1c4b19db99d21d44c2ab457cfd44eb86b4439 NSM: Add dprintk() calls in nsm_find and nsm_release
67c6d107a689243979a2b5f15244b5261634a924 NSM: Move nsm_find() to fs/lockd/mon.c
03eb1dcbb799304b58730f4dba65812f49fb305e NSM: move to xdr_stream-based XDR encoders and decoders
36e8e668d3e6a61848a8921ddeb663b417299fa5 NSM: Move NSM program and procedure numbers to fs/lockd/mon.c
9c1bfd037f7ff8badaecb47418f109148d88bf45 NSM: Move NSM-related XDR data structures to lockd's xdr.h
0c7aef4569f8680951b7dee01dddffb9d2f809ff NSM: Check result of SM_UNMON upcall
356c3eb466fd1a12afd6448d90fba3922836e5f1 NLM: Move the public declaration of nsm_unmonitor() to lockd.h
c8c23c423dec49cb439697d3dc714e1500ff1610 NSM: Release nsmhandle in nlm_destroy_host
1e49323c4ab044d05bbc68cf13cadcbd4372468c NLM: Move the public declaration of nsm_monitor() to lockd.h
5d254b119823658cc318f88589c6c426b3d0a153 NSM: Make sure to return an error if the SM_MON call result is not zero
5bc74bef7c9b652f0f2aa9c5a8d5ac86881aba79 NSM: Remove BUG_ON() in nsm_monitor()
501c1ed3fb5c2648ba1709282c71617910917f66 NLM: Remove redundant printk() in nlmclnt_lock()
9fee49024ed19d849413df4ab6ec1a1a60aaae94 NSM: Use sm_name instead of h_name in nsm_monitor() and nsm_unmonitor()
29ed1407ed81086b778ebf12145b048ac3f7e10e NSM: Support IPv6 version of mon_name
f47534f7f0ac7727e05ec4274b764b181df2cf7f NSM: Use modern style for sm_name field in nsm_handle
5acf43155d1bcc412d892c73f64044f9a826cde6 NSM: convert printk(KERN_DEBUG) to a dprintk()
a4846750f090702e2fb848ac4fe5827bcef34060 NSM: Use C99 structure initializer to initialize nsm_args
afb03699dc0a920aed3322ad0e6895533941fb1e NLM: Add helper to handle IPv4 addresses
bc995801a09d1fead0bec1356bfd836911c8eed7 NLM: Support IPv6 scope IDs in nlm_display_address()
6999fb4016b2604c2f8a65586bba4a62a4b24ce7 NLM: Remove AF_UNSPEC arm in nlm_display_address()
1df40b609ad5a622904eb652109c287fe9c93ec5 NLM: Remove address eye-catcher buffers from nlm_host
7538ce1eb656a1477bedd5b1c202226e7abf5e7b NLM: Use modern style for pointer fields in nlm_host
c72a476b4b7ecadb80185de31236edb303c1a5d0 lockd: set svc_serv->sv_maxconn to a more reasonable value (try #3)
c9233eb7b0b11ef176d4bf68da2ce85464b6ec39 sunrpc: add sv_maxconn field to svc_serv (try #3)
548eaca46b3cf4419b6c2be839a106d8641ffb70 nfsd: document new filehandle fsid types
2bd9e7b62e6e1da3f881c40c73d93e9a212ce6de nfsd: Fix leaked memory in nfs4_make_rec_clidname
9346eff0dea1e5855fba25c9fe639d92a4db3135 nfsd: Minor cleanup of find_stateid
b3d47676d474ecd914c72049c87e71e5f0ffe040 nfsd: update fh_verify description
--
Ian Campbell
You're definitely on their list. The question to ask next is what list it is.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2009-01-08 21:22 ` Ian Campbell
@ 2009-01-08 21:26 ` J. Bruce Fields
2009-01-12 9:46 ` Ian Campbell
2009-01-22 8:27 ` Ian Campbell
0 siblings, 2 replies; 50+ messages in thread
From: J. Bruce Fields @ 2009-01-08 21:26 UTC (permalink / raw)
To: Ian Campbell
Cc: Trond Myklebust, linux-nfs, Max Kellermann, linux-kernel, gcosta,
Grant Coady, Tom Tucker
On Thu, Jan 08, 2009 at 09:22:33PM +0000, Ian Campbell wrote:
> On Thu, 2009-01-08 at 13:20 -0500, J. Bruce Fields wrote:
> > On Wed, Jan 07, 2009 at 05:21:15PM -0500, J. Bruce Fields wrote:
> > > On Tue, Dec 16, 2008 at 06:39:35PM +0000, Ian Campbell wrote:
> > > > That's right, it was actually 2.6.26.7 FWIW.
> > > >
> > > > > I'll try to take a look at these before I leave for the holidays,
> > > > > assuming the versions Trond posted on Nov. 30 are the latest.
> > > >
> > > > Thanks.
> > >
> > > Sorry for getting behind.
> > >
> > > If you got a chance to retest with the for-2.6.29 branch at
> > >
> > > git://linux-nfs.org/~bfields/linux.git for-2.6.29
> > >
> > > that'd be great; that's what I intend to send to Linus.
> >
> > (Merged now, so testing mainline as of today should work too.)
>
> The server isn't really a machine I want to test random kernels on, is
> there some subset of those changesets which it would be useful for me to
> pull back onto the 2.6.26 kernel I'm using to test? (I can most like
> manage the backporting myself).
>
> These two look like the relevant ones to me but I'm not sure:
> 22945e4a1c7454c97f5d8aee1ef526c83fef3223 svc: Clean up deferred requests on transport destruction
> 69b6ba3712b796a66595cfaf0a5ab4dfe1cf964a SUNRPC: Ensure the server closes sockets in a timely fashion
>
> I think 69b6 was in the set of three I tested previously and the other
> two turned into 2294?
Yep, exactly.--b.
>
> Ian.
>
> Full list 238c6d54830c624f34ac9cf123ac04aebfca5013..nfs-bfields/for-2.6.29:
> db43910cb42285a99f45f7e0a0a32e32d0b61dcf nfsd: get rid of NFSD_VERSION
> 87df4de8073f922a1f643b9fa6ba0412d5529ecf nfsd: last_byte_offset
> 4e65ebf08951326709817e654c149d0a94982e01 nfsd: delete wrong file comment from nfsd/nfs4xdr.c
> df96fcf02a5fd2ae4e9b09e079dd6ef12d10ecd7 nfsd: git rid of nfs4_cb_null_ops declaration
> 0407717d8587f60003f4904bff27650cd836c00c nfsd: dprint each op status in nfsd4_proc_compound
> b7aeda40d3010666d2c024c80557b6aa92a1a1ad nfsd: add etoosmall to nfserrno
> 30fa8c0157e4591ee2227aaa0b17cd3b0da5e6cb NFSD: FIDs need to take precedence over UUIDs
> 24c3767e41a6a59d32bb45abe899eb194e6bf1b8 SUNRPC: The sunrpc server code should not be used by out-of-tree modules
> 22945e4a1c7454c97f5d8aee1ef526c83fef3223 svc: Clean up deferred requests on transport destruction
> 9a8d248e2d2e9c880ac4561f27fea5dc200655bd nfsd: fix double-locks of directory mutex
> 2779e3ae39645515cb6c1126634f47c28c9e7190 svc: Move kfree of deferral record to common code
> f05ef8db1abe68e3f6fc272efee51bc54ce528c5 CRED: Fix NFSD regression
> 0dba7c2a9ed3d4a1e58f5d94fffa9f44dbe012e6 NLM: Clean up flow of control in make_socks() function
> d3fe5ea7cf815c037c90b1f1464ffc1ab5e8601b NLM: Refactor make_socks() function
> 55ef1274dddd4de387c54d110e354ffbb6cdc706 nfsd: Ensure nfsv4 calls the underlying filesystem on LOCKT
> 69b6ba3712b796a66595cfaf0a5ab4dfe1cf964a SUNRPC: Ensure the server closes sockets in a timely fashion
> 262a09823bb07c6aafb6c1d312cde613d0b90c85 NFSD: Add documenting comments for nfsctl interface
> 9e074856caf13ba83363f73759f5e395f74ccf41 NFSD: Replace open-coded integer with macro
> 54224f04ae95d86b27c0673cd773ebb120d86876 NFSD: Fix a handful of coding style issues in write_filehandle()
> b046ccdc1f8171f6d0129dcc2a28d49187b4bf69 NFSD: clean up failover sysctl function naming
> b064ec038a6180b13e5f89b6a30b42cb5ce8febc lockd: Enable NLM use of AF_INET6
> 57ef692588bc225853ca3267ca5b7cea2b07e058 NLM: Rewrite IPv4 privileged requester's check
> d1208f70738c91f13b4eadb1b7a694082e439da2 NLM: nlm_privileged_requester() doesn't recognize mapped loopback address
> 49b5699b3fc22b363534c509c1b7dba06bc677bf NSM: Move nsm_create()
> b7ba597fb964dfa44284904b3b3d74d44b8e1c42 NSM: Move nsm_use_hostnames to mon.c
> 8529bc51d30b8f001734b29b21a51b579c260f5b NSM: Move nsm_addr() to fs/lockd/mon.c
> e6765b83977f07983c7a10e6bbb19d6c7bbfc3a4 NSM: Remove include/linux/lockd/sm_inter.h
> 94da7663db26530a8377f7219f8be8bd4d4822c2 NSM: Replace IP address as our nlm_reboot lookup key
> 77a3ef33e2de6fc8aabd7cb1700bfef81757c28a NSM: More clean up of nsm_get_handle()
> b39b897c259fc1fd1998505f2b1d4ec1f115bce1 NSM: Refactor nsm_handle creation into a helper function
> 92fd91b998a5216a6d6606704e71d541a180216c NLM: Remove "create" argument from nsm_find()
> 8c7378fd2a5f22016542931b887a2ae98d146eaf NLM: Call nsm_reboot_lookup() instead of nsm_find()
> 3420a8c4359a189f7d854ed7075d151257415447 NSM: Add nsm_lookup() function
> 576df4634e37e46b441fefb91915184edb13bb94 NLM: Decode "priv" argument of NLMPROC_SM_NOTIFY as an opaque
> 7fefc9cb9d5f129c238d93166f705c96ca2e7e51 NLM: Change nlm_host_rebooted() to take a single nlm_reboot argument
> cab2d3c99165abbba2943f1b269003b17fd3b1cb NSM: Encode the new "priv" cookie for NSMPROC_MON requests
> 7e44d3bea21fbb9494930d1cd35ca92a9a4a3279 NSM: Generate NSMPROC_MON's "priv" argument when nsm_handle is created
> 05f3a9af58180d24a9decedd71d4587935782d70 NSM: Remove !nsm check from nsm_release()
> bc1cc6c4e476b60df48227165990c87a22db6bb7 NSM: Remove NULL pointer check from nsm_find()
> 5cf1c4b19db99d21d44c2ab457cfd44eb86b4439 NSM: Add dprintk() calls in nsm_find and nsm_release
> 67c6d107a689243979a2b5f15244b5261634a924 NSM: Move nsm_find() to fs/lockd/mon.c
> 03eb1dcbb799304b58730f4dba65812f49fb305e NSM: move to xdr_stream-based XDR encoders and decoders
> 36e8e668d3e6a61848a8921ddeb663b417299fa5 NSM: Move NSM program and procedure numbers to fs/lockd/mon.c
> 9c1bfd037f7ff8badaecb47418f109148d88bf45 NSM: Move NSM-related XDR data structures to lockd's xdr.h
> 0c7aef4569f8680951b7dee01dddffb9d2f809ff NSM: Check result of SM_UNMON upcall
> 356c3eb466fd1a12afd6448d90fba3922836e5f1 NLM: Move the public declaration of nsm_unmonitor() to lockd.h
> c8c23c423dec49cb439697d3dc714e1500ff1610 NSM: Release nsmhandle in nlm_destroy_host
> 1e49323c4ab044d05bbc68cf13cadcbd4372468c NLM: Move the public declaration of nsm_monitor() to lockd.h
> 5d254b119823658cc318f88589c6c426b3d0a153 NSM: Make sure to return an error if the SM_MON call result is not zero
> 5bc74bef7c9b652f0f2aa9c5a8d5ac86881aba79 NSM: Remove BUG_ON() in nsm_monitor()
> 501c1ed3fb5c2648ba1709282c71617910917f66 NLM: Remove redundant printk() in nlmclnt_lock()
> 9fee49024ed19d849413df4ab6ec1a1a60aaae94 NSM: Use sm_name instead of h_name in nsm_monitor() and nsm_unmonitor()
> 29ed1407ed81086b778ebf12145b048ac3f7e10e NSM: Support IPv6 version of mon_name
> f47534f7f0ac7727e05ec4274b764b181df2cf7f NSM: Use modern style for sm_name field in nsm_handle
> 5acf43155d1bcc412d892c73f64044f9a826cde6 NSM: convert printk(KERN_DEBUG) to a dprintk()
> a4846750f090702e2fb848ac4fe5827bcef34060 NSM: Use C99 structure initializer to initialize nsm_args
> afb03699dc0a920aed3322ad0e6895533941fb1e NLM: Add helper to handle IPv4 addresses
> bc995801a09d1fead0bec1356bfd836911c8eed7 NLM: Support IPv6 scope IDs in nlm_display_address()
> 6999fb4016b2604c2f8a65586bba4a62a4b24ce7 NLM: Remove AF_UNSPEC arm in nlm_display_address()
> 1df40b609ad5a622904eb652109c287fe9c93ec5 NLM: Remove address eye-catcher buffers from nlm_host
> 7538ce1eb656a1477bedd5b1c202226e7abf5e7b NLM: Use modern style for pointer fields in nlm_host
> c72a476b4b7ecadb80185de31236edb303c1a5d0 lockd: set svc_serv->sv_maxconn to a more reasonable value (try #3)
> c9233eb7b0b11ef176d4bf68da2ce85464b6ec39 sunrpc: add sv_maxconn field to svc_serv (try #3)
> 548eaca46b3cf4419b6c2be839a106d8641ffb70 nfsd: document new filehandle fsid types
> 2bd9e7b62e6e1da3f881c40c73d93e9a212ce6de nfsd: Fix leaked memory in nfs4_make_rec_clidname
> 9346eff0dea1e5855fba25c9fe639d92a4db3135 nfsd: Minor cleanup of find_stateid
> b3d47676d474ecd914c72049c87e71e5f0ffe040 nfsd: update fh_verify description
>
>
> --
> Ian Campbell
>
> You're definitely on their list. The question to ask next is what list it is.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2009-01-08 21:26 ` J. Bruce Fields
@ 2009-01-12 9:46 ` Ian Campbell
2009-01-22 8:27 ` Ian Campbell
1 sibling, 0 replies; 50+ messages in thread
From: Ian Campbell @ 2009-01-12 9:46 UTC (permalink / raw)
To: J. Bruce Fields
Cc: Trond Myklebust, linux-nfs, Max Kellermann, linux-kernel, gcosta,
Grant Coady, Tom Tucker
On Thu, 2009-01-08 at 16:26 -0500, J. Bruce Fields wrote:
> On Thu, Jan 08, 2009 at 09:22:33PM +0000, Ian Campbell wrote:
> > These two look like the relevant ones to me but I'm not sure:
> > 22945e4a1c7454c97f5d8aee1ef526c83fef3223 svc: Clean up deferred requests on transport destruction
> > 69b6ba3712b796a66595cfaf0a5ab4dfe1cf964a SUNRPC: Ensure the server closes sockets in a timely fashion
> >
> > I think 69b6 was in the set of three I tested previously and the other
> > two turned into 2294?
>
> Yep, exactly.--b.
OK, I have patched my 2.6.26 kernel with these and it is now running.
It'll be about a week before I can say with any certainty that the issue
hasn't recurred.
Ian.
--
Ian Campbell
Current Noise: Pitchshifter - Subject To Status
Do I have a lifestyle yet?
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2009-01-08 21:26 ` J. Bruce Fields
2009-01-12 9:46 ` Ian Campbell
@ 2009-01-22 8:27 ` Ian Campbell
2009-01-22 16:44 ` J. Bruce Fields
1 sibling, 1 reply; 50+ messages in thread
From: Ian Campbell @ 2009-01-22 8:27 UTC (permalink / raw)
To: J. Bruce Fields
Cc: Trond Myklebust, linux-nfs, Max Kellermann, linux-kernel, gcosta,
Grant Coady, Tom Tucker
[-- Attachment #1: Type: text/plain, Size: 1140 bytes --]
On Thu, 2009-01-08 at 16:26 -0500, J. Bruce Fields wrote:
>
> > > (Merged now, so testing mainline as of today should work too.)
> >
> > The server isn't really a machine I want to test random kernels on,
> is
> > there some subset of those changesets which it would be useful for
> me to
> > pull back onto the 2.6.26 kernel I'm using to test? (I can most like
> > manage the backporting myself).
> >
> > These two look like the relevant ones to me but I'm not sure:
> > 22945e4a1c7454c97f5d8aee1ef526c83fef3223 svc: Clean up deferred
> requests on transport destruction
> > 69b6ba3712b796a66595cfaf0a5ab4dfe1cf964a SUNRPC: Ensure the server
> closes sockets in a timely fashion
> >
> > I think 69b6 was in the set of three I tested previously and the
> other
> > two turned into 2294?
>
> Yep, exactly.--b.
The client machine now has an uptime of ten days without error after
these two patches were applied to the server.
Thanks everybody,
Ian.
--
Ian Campbell
I used to think that the brain was the most wonderful organ in
my body. Then I realized who was telling me this.
-- Emo Phillips
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2009-01-22 8:27 ` Ian Campbell
@ 2009-01-22 16:44 ` J. Bruce Fields
0 siblings, 0 replies; 50+ messages in thread
From: J. Bruce Fields @ 2009-01-22 16:44 UTC (permalink / raw)
To: Ian Campbell
Cc: Trond Myklebust, linux-nfs, Max Kellermann, linux-kernel, gcosta,
Grant Coady, Tom Tucker
On Thu, Jan 22, 2009 at 08:27:40AM +0000, Ian Campbell wrote:
> On Thu, 2009-01-08 at 16:26 -0500, J. Bruce Fields wrote:
> >
> > > > (Merged now, so testing mainline as of today should work too.)
> > >
> > > The server isn't really a machine I want to test random kernels on,
> > is
> > > there some subset of those changesets which it would be useful for
> > me to
> > > pull back onto the 2.6.26 kernel I'm using to test? (I can most like
> > > manage the backporting myself).
> > >
> > > These two look like the relevant ones to me but I'm not sure:
> > > 22945e4a1c7454c97f5d8aee1ef526c83fef3223 svc: Clean up deferred
> > requests on transport destruction
> > > 69b6ba3712b796a66595cfaf0a5ab4dfe1cf964a SUNRPC: Ensure the server
> > closes sockets in a timely fashion
> > >
> > > I think 69b6 was in the set of three I tested previously and the
> > other
> > > two turned into 2294?
> >
> > Yep, exactly.--b.
>
> The client machine now has an uptime of ten days without error after
> these two patches were applied to the server.
>
> Thanks everybody,
Very good, so upstream should be OK. Thanks for the testing!
--b.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2008-10-20 6:51 ` Max Kellermann
2008-10-20 7:43 ` Ian Campbell
2008-10-20 13:15 ` Glauber Costa
@ 2009-05-22 20:59 ` H. Peter Anvin
2009-05-25 13:12 ` Max Kellermann
2 siblings, 1 reply; 50+ messages in thread
From: H. Peter Anvin @ 2009-05-22 20:59 UTC (permalink / raw)
To: Max Kellermann
Cc: Glauber Costa, linux-kernel, ijc, Grant Coady, Trond Myklebust,
J. Bruce Fields, Tom Tucker, Cyrill Gorcunov
Max Kellermann wrote:
> On 2008/10/17 16:33, Glauber Costa <glommer@redhat.com> wrote:
>> That's probably something related to apic congestion.
>> Does the problem go away if the only thing you change is this:
>>
>>
>>> @@ -891,11 +897,6 @@ do_rest:
>>> store_NMI_vector(&nmi_high, &nmi_low);
>>>
>>> smpboot_setup_warm_reset_vector(start_ip);
>>> - /*
>>> - * Be paranoid about clearing APIC errors.
>>> - */
>>> - apic_write(APIC_ESR, 0);
>>> - apic_read(APIC_ESR);
>>> }
>>
>> Please let me know.
>
> Hello Glauber,
>
> I have rebooted the server with 2.6.27.1 + this patchlet an hour ago.
> No problems since.
>
> Hardware: Compaq P4 Xeon server, Broadcom CMIC-WS / CIOB-X2 board.
> Tell me if you need more detailed information.
>
I was just pointed to this thread in the archives, and I'm really
wondering what was going on here. In particular, if this machine still
is around, I would appreciate a dump of /proc/cpuinfo as well as
dmidecode from this machine.
Thanks,
-hpa
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
2009-05-22 20:59 ` H. Peter Anvin
@ 2009-05-25 13:12 ` Max Kellermann
0 siblings, 0 replies; 50+ messages in thread
From: Max Kellermann @ 2009-05-25 13:12 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 376 bytes --]
On 2009/05/22 22:59, "H. Peter Anvin" <hpa@zytor.com> wrote:
> I was just pointed to this thread in the archives, and I'm really
> wondering what was going on here. In particular, if this machine still
> is around, I would appreciate a dump of /proc/cpuinfo as well as
> dmidecode from this machine.
Here it is. We run these machines with 2.6.29, the problem is gone.
Max
[-- Attachment #2: cpuinfo --]
[-- Type: text/plain, Size: 2264 bytes --]
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 3.06GHz
stepping : 5
cpu MHz : 3066.590
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid xtpr
bogomips : 6105.99
clflush size : 64
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 3.06GHz
stepping : 5
cpu MHz : 3066.590
cache size : 1024 KB
physical id : 3
siblings : 2
core id : 0
cpu cores : 1
apicid : 6
initial apicid : 6
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid xtpr
bogomips : 6106.57
clflush size : 64
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 3.06GHz
stepping : 5
cpu MHz : 3066.590
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
apicid : 1
initial apicid : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid xtpr
bogomips : 6106.63
clflush size : 64
power management:
processor : 3
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 3.06GHz
stepping : 5
cpu MHz : 3066.590
cache size : 1024 KB
physical id : 3
siblings : 2
core id : 0
cpu cores : 1
apicid : 7
initial apicid : 7
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid xtpr
bogomips : 6106.64
clflush size : 64
power management:
[-- Attachment #3: dmidecode --]
[-- Type: text/plain, Size: 14551 bytes --]
# dmidecode 2.9
SMBIOS 2.3 present.
49 structures occupying 1334 bytes.
Table at 0x000EC000.
Handle 0x0000, DMI type 0, 20 bytes
BIOS Information
Vendor: HP
Version: P29
Release Date: 07/25/2003
Address: 0xF0000
Runtime Size: 64 kB
ROM Size: 2048 kB
Characteristics:
PCI is supported
PNP is supported
BIOS is upgradeable
BIOS shadowing is allowed
ESCD support is available
Boot from CD is supported
Selectable boot is supported
5.25"/360 KB floppy services are supported (int 13h)
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 KB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
CGA/mono video services are supported (int 10h)
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Handle 0x0100, DMI type 1, 25 bytes
System Information
Manufacturer: HP
Product Name: ProLiant DL380 G3
Version: Not Specified
Serial Number: 8090LDN31R
UUID: 38303930-4C44-4E33-3152-202020202020
Wake-up Type: Power Switch
Handle 0x0300, DMI type 3, 13 bytes
Chassis Information
Manufacturer: HP
Type: Rack Mount Chassis
Lock: Not Present
Version: Not Specified
Serial Number: 8090LDN31R
Asset Tag:
Boot-up State: Unknown
Power Supply State: Unknown
Thermal State: Unknown
Security Status: Unknown
Handle 0x0406, DMI type 4, 32 bytes
Processor Information
Socket Designation: Proc 1
Type: Central Processor
Family: Xeon
Manufacturer: Intel
ID: 25 0F 00 00 FF FB EB BF
Signature: Type 0, Family 15, Model 2, Stepping 5
Flags:
FPU (Floating-point unit on-chip)
VME (Virtual mode extension)
DE (Debugging extension)
PSE (Page size extension)
TSC (Time stamp counter)
MSR (Model specific registers)
PAE (Physical address extension)
MCE (Machine check exception)
CX8 (CMPXCHG8 instruction supported)
APIC (On-chip APIC hardware supported)
SEP (Fast system call)
MTRR (Memory type range registers)
PGE (Page global enable)
MCA (Machine check architecture)
CMOV (Conditional move instruction supported)
PAT (Page attribute table)
PSE-36 (36-bit page size extension)
CLFSH (CLFLUSH instruction supported)
DS (Debug store)
ACPI (ACPI supported)
MMX (MMX technology supported)
FXSR (Fast floating-point save and restore)
SSE (Streaming SIMD extensions)
SSE2 (Streaming SIMD extensions 2)
SS (Self-snoop)
HTT (Hyper-threading technology)
TM (Thermal monitor supported)
PBE (Pending break enabled)
Version: Not Specified
Voltage: 1.7 V
External Clock: 533 MHz
Max Speed: 3600 MHz
Current Speed: 3066 MHz
Status: Populated, Idle
Upgrade: ZIF Socket
L1 Cache Handle: 0x0716
L2 Cache Handle: 0x0726
L3 Cache Handle: 0x0736
Handle 0x0400, DMI type 4, 32 bytes
Processor Information
Socket Designation: Proc 2
Type: Central Processor
Family: Xeon
Manufacturer: Intel
ID: 25 0F 00 00 FF FB EB BF
Signature: Type 0, Family 15, Model 2, Stepping 5
Flags:
FPU (Floating-point unit on-chip)
VME (Virtual mode extension)
DE (Debugging extension)
PSE (Page size extension)
TSC (Time stamp counter)
MSR (Model specific registers)
PAE (Physical address extension)
MCE (Machine check exception)
CX8 (CMPXCHG8 instruction supported)
APIC (On-chip APIC hardware supported)
SEP (Fast system call)
MTRR (Memory type range registers)
PGE (Page global enable)
MCA (Machine check architecture)
CMOV (Conditional move instruction supported)
PAT (Page attribute table)
PSE-36 (36-bit page size extension)
CLFSH (CLFLUSH instruction supported)
DS (Debug store)
ACPI (ACPI supported)
MMX (MMX technology supported)
FXSR (Fast floating-point save and restore)
SSE (Streaming SIMD extensions)
SSE2 (Streaming SIMD extensions 2)
SS (Self-snoop)
HTT (Hyper-threading technology)
TM (Thermal monitor supported)
PBE (Pending break enabled)
Version: Not Specified
Voltage: 1.7 V
External Clock: 533 MHz
Max Speed: 3600 MHz
Current Speed: 3066 MHz
Status: Populated, Enabled
Upgrade: ZIF Socket
L1 Cache Handle: 0x0710
L2 Cache Handle: 0x0720
L3 Cache Handle: 0x0730
Handle 0x0716, DMI type 7, 19 bytes
Cache Information
Socket Designation: Processor 1 Internal L1 Cache
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 8 KB
Maximum Size: 32 KB
Supported SRAM Types:
Burst
Installed SRAM Type: Burst
Speed: Unknown
Error Correction Type: Unknown
System Type: Unknown
Associativity: 4-way Set-associative
Handle 0x0710, DMI type 7, 19 bytes
Cache Information
Socket Designation: Processor 2 Internal L1 Cache
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 8 KB
Maximum Size: 32 KB
Supported SRAM Types:
Burst
Installed SRAM Type: Burst
Speed: Unknown
Error Correction Type: Unknown
System Type: Unknown
Associativity: 4-way Set-associative
Handle 0x0726, DMI type 7, 19 bytes
Cache Information
Socket Designation: Processor 1 Internal L2 Cache
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Back
Location: Internal
Installed Size: 512 KB
Maximum Size: 2048 KB
Supported SRAM Types:
Burst
Installed SRAM Type: Burst
Speed: Unknown
Error Correction Type: Unknown
System Type: Unknown
Associativity: Other
Handle 0x0720, DMI type 7, 19 bytes
Cache Information
Socket Designation: Processor 2 Internal L2 Cache
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Back
Location: Internal
Installed Size: 512 KB
Maximum Size: 2048 KB
Supported SRAM Types:
Burst
Installed SRAM Type: Burst
Speed: Unknown
Error Correction Type: Unknown
System Type: Unknown
Associativity: Other
Handle 0x0736, DMI type 7, 19 bytes
Cache Information
Socket Designation: Processor 1 Internal L3 Cache
Configuration: Enabled, Not Socketed, Level 3
Operational Mode: Write Back
Location: Internal
Installed Size: 1024 KB
Maximum Size: 1024 KB
Supported SRAM Types:
Burst
Installed SRAM Type: Burst
Speed: Unknown
Error Correction Type: Unknown
System Type: Unknown
Associativity: Other
Handle 0x0730, DMI type 7, 19 bytes
Cache Information
Socket Designation: Processor 2 Internal L3 Cache
Configuration: Enabled, Not Socketed, Level 3
Operational Mode: Write Back
Location: Internal
Installed Size: 1024 KB
Maximum Size: 1024 KB
Supported SRAM Types:
Burst
Installed SRAM Type: Burst
Speed: Unknown
Error Correction Type: Unknown
System Type: Unknown
Associativity: Other
Handle 0x0801, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: J58
Internal Connector Type: Access Bus (USB)
External Reference Designator: USB Port 1
External Connector Type: Access Bus (USB)
Port Type: USB
Handle 0x0802, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: J58
Internal Connector Type: Access Bus (USB)
External Reference Designator: USB Port 2
External Connector Type: Access Bus (USB)
Port Type: USB
Handle 0x0901, DMI type 9, 13 bytes
System Slot Information
Designation: PCI Slot 1
Type: 64-bit PCI-X
Current Usage: Available
Length: Long
ID: 1
Characteristics:
3.3 V is provided
PME signal is supported
Handle 0x0902, DMI type 9, 13 bytes
System Slot Information
Designation: PCI Slot 2
Type: 64-bit PCI-X
Current Usage: In Use
Length: Long
ID: 2
Characteristics:
3.3 V is provided
PME signal is supported
Hot-plug devices are supported
Handle 0x0903, DMI type 9, 13 bytes
System Slot Information
Designation: PCI Slot 3
Type: 64-bit PCI-X
Current Usage: In Use
Length: Long
ID: 3
Characteristics:
3.3 V is provided
PME signal is supported
Hot-plug devices are supported
Handle 0x1000, DMI type 16, 15 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Single-bit ECC
Maximum Capacity: 12 GB
Error Information Handle: Not Provided
Number Of Devices: 6
Handle 0x1100, DMI type 17, 23 bytes
Memory Device
Array Handle: 0x1000
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 512 MB
Form Factor: DIMM
Set: 1
Locator: DIMM 01
Bank Locator: Not Specified
Type: DDR
Type Detail: Synchronous
Speed: 266 MHz (3.8 ns)
Handle 0x1101, DMI type 17, 23 bytes
Memory Device
Array Handle: 0x1000
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 512 MB
Form Factor: DIMM
Set: 1
Locator: DIMM 02
Bank Locator: Not Specified
Type: DDR
Type Detail: Synchronous
Speed: 266 MHz (3.8 ns)
Handle 0x1102, DMI type 17, 23 bytes
Memory Device
Array Handle: 0x1000
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 1024 MB
Form Factor: DIMM
Set: 2
Locator: DIMM 03
Bank Locator: Not Specified
Type: DDR
Type Detail: Synchronous
Speed: 266 MHz (3.8 ns)
Handle 0x1103, DMI type 17, 23 bytes
Memory Device
Array Handle: 0x1000
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 1024 MB
Form Factor: DIMM
Set: 2
Locator: DIMM 04
Bank Locator: Not Specified
Type: DDR
Type Detail: Synchronous
Speed: 266 MHz (3.8 ns)
Handle 0x1104, DMI type 17, 23 bytes
Memory Device
Array Handle: 0x1000
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 512 MB
Form Factor: DIMM
Set: 3
Locator: DIMM 05
Bank Locator: Not Specified
Type: DDR
Type Detail: Synchronous
Speed: 266 MHz (3.8 ns)
Handle 0x1105, DMI type 17, 23 bytes
Memory Device
Array Handle: 0x1000
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 512 MB
Form Factor: DIMM
Set: 3
Locator: DIMM 06
Bank Locator: Not Specified
Type: DDR
Type Detail: Synchronous
Speed: 266 MHz (3.8 ns)
Handle 0x1300, DMI type 19, 15 bytes
Memory Array Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x000FFFFFFFF
Range Size: 4 GB
Physical Array Handle: 0x1000
Partition Width: 0
Handle 0x1400, DMI type 20, 19 bytes
Memory Device Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x0003FFFFFFF
Range Size: 1 GB
Physical Device Handle: 0x1100
Memory Array Mapped Address Handle: 0x1300
Partition Row Position: 1
Interleave Position: 1
Interleaved Data Depth: Unknown
Handle 0x1401, DMI type 20, 19 bytes
Memory Device Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x0003FFFFFFF
Range Size: 1 GB
Physical Device Handle: 0x1101
Memory Array Mapped Address Handle: 0x1300
Partition Row Position: 1
Interleave Position: 2
Interleaved Data Depth: Unknown
Handle 0x1402, DMI type 20, 19 bytes
Memory Device Mapped Address
Starting Address: 0x00040000000
Ending Address: 0x000BFFFFFFF
Range Size: 2 GB
Physical Device Handle: 0x1102
Memory Array Mapped Address Handle: 0x1300
Partition Row Position: 1
Interleave Position: 1
Interleaved Data Depth: Unknown
Handle 0x1403, DMI type 20, 19 bytes
Memory Device Mapped Address
Starting Address: 0x00040000000
Ending Address: 0x000BFFFFFFF
Range Size: 2 GB
Physical Device Handle: 0x1103
Memory Array Mapped Address Handle: 0x1300
Partition Row Position: 1
Interleave Position: 2
Interleaved Data Depth: Unknown
Handle 0x1404, DMI type 20, 19 bytes
Memory Device Mapped Address
Starting Address: 0x000C0000000
Ending Address: 0x000F50003FF
Range Size: 868353 kB
Physical Device Handle: 0x1104
Memory Array Mapped Address Handle: 0x1300
Partition Row Position: 1
Interleave Position: 1
Interleaved Data Depth: Unknown
Handle 0x1405, DMI type 20, 19 bytes
Memory Device Mapped Address
Starting Address: 0x000C0000000
Ending Address: 0x000F50003FF
Range Size: 868353 kB
Physical Device Handle: 0x1105
Memory Array Mapped Address Handle: 0x1300
Partition Row Position: 1
Interleave Position: 2
Interleaved Data Depth: Unknown
Handle 0x1450, DMI type 20, 19 bytes
Memory Device Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x000000003FF
Range Size: 1 kB
Physical Device Handle: 0x0000
Memory Array Mapped Address Handle: 0x0000
Partition Row Position: 1
Interleave Position: 1
Interleaved Data Depth: Unknown
Handle 0x1451, DMI type 20, 19 bytes
Memory Device Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x000000003FF
Range Size: 1 kB
Physical Device Handle: 0x0000
Memory Array Mapped Address Handle: 0x0000
Partition Row Position: 1
Interleave Position: 2
Interleaved Data Depth: Unknown
Handle 0x2000, DMI type 32, 11 bytes
System Boot Information
Status: Firmware-detected hardware failure
Handle 0xC100, DMI type 193, 7 bytes
OEM-specific Type
Header and Data:
C1 07 00 C1 01 01 02
Strings:
07/25/2003
03/19/2003
Handle 0xC200, DMI type 194, 5 bytes
OEM-specific Type
Header and Data:
C2 05 00 C2 09
Handle 0xC300, DMI type 195, 5 bytes
OEM-specific Type
Header and Data:
C3 05 00 C3 01
Strings:
$0E110727
Handle 0xC400, DMI type 196, 5 bytes
OEM-specific Type
Header and Data:
C4 05 00 C4 00
Handle 0xC506, DMI type 197, 10 bytes
OEM-specific Type
Header and Data:
C5 0A 06 C5 06 04 06 00 FF 01
Handle 0xC500, DMI type 197, 10 bytes
OEM-specific Type
Header and Data:
C5 0A 00 C5 00 04 00 01 FF 02
Handle 0xC600, DMI type 198, 10 bytes
OEM-specific Type
Header and Data:
C6 0A 00 C6 17 00 00 01 3C 00
Handle 0xC700, DMI type 199, 52 bytes
OEM-specific Type
Header and Data:
C7 34 00 C7 11 00 00 00 03 20 13 06 25 0F 00 00
15 00 00 00 03 20 04 06 29 0F 00 00 1D 00 00 00
03 20 16 01 24 0F 00 00 38 00 00 00 03 20 04 06
27 0F 00 00
Handle 0xCD00, DMI type 205, 22 bytes
OEM-specific Type
Header and Data:
CD 16 00 CD 01 01 46 41 54 78 00 00 F1 FF 00 00
00 00 00 80 02 00
Handle 0xCA00, DMI type 202, 8 bytes
OEM-specific Type
Header and Data:
CA 08 00 CA 00 11 FF 01
Handle 0xCA01, DMI type 202, 8 bytes
OEM-specific Type
Header and Data:
CA 08 01 CA 01 11 FF 02
Handle 0xCA02, DMI type 202, 8 bytes
OEM-specific Type
Header and Data:
CA 08 02 CA 02 11 FF 03
Handle 0xCA03, DMI type 202, 8 bytes
OEM-specific Type
Header and Data:
CA 08 03 CA 03 11 FF 04
Handle 0xCA04, DMI type 202, 8 bytes
OEM-specific Type
Header and Data:
CA 08 04 CA 04 11 FF 05
Handle 0xCA05, DMI type 202, 8 bytes
OEM-specific Type
Header and Data:
CA 08 05 CA 05 11 FF 06
Handle 0x7F00, DMI type 127, 4 bytes
End Of Table
^ permalink raw reply [flat|nested] 50+ messages in thread
end of thread, other threads:[~2009-05-25 13:12 UTC | newest]
Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-17 12:32 [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds" Max Kellermann
2008-10-17 14:33 ` Glauber Costa
2008-10-20 6:51 ` Max Kellermann
2008-10-20 7:43 ` Ian Campbell
2008-10-20 13:15 ` Glauber Costa
2008-10-20 14:12 ` Max Kellermann
2008-10-20 14:34 ` Cyrill Gorcunov
2008-10-20 14:21 ` Cyrill Gorcunov
2009-05-22 20:59 ` H. Peter Anvin
2009-05-25 13:12 ` Max Kellermann
2008-10-20 6:27 ` Ian Campbell
2008-11-01 11:45 ` Ian Campbell
2008-11-01 13:41 ` Trond Myklebust
2008-11-02 14:40 ` Ian Campbell
2008-11-07 2:12 ` kenneth johansson
2008-11-04 19:10 ` Ian Campbell
2008-11-25 7:09 ` Ian Campbell
2008-11-25 13:28 ` Trond Myklebust
2008-11-25 13:38 ` Ian Campbell
2008-11-25 13:57 ` Trond Myklebust
2008-11-25 14:04 ` Ian Campbell
2008-11-26 22:12 ` Ian Campbell
2008-12-01 0:17 ` [PATCH 0/3] " Trond Myklebust
2008-12-01 0:18 ` [PATCH 1/3] SUNRPC: Ensure the server closes sockets in a timely fashion Trond Myklebust
2008-12-17 15:27 ` Tom Tucker
2008-12-17 18:08 ` Trond Myklebust
2008-12-17 18:59 ` Tom Tucker
2008-12-01 0:19 ` [PATCH 2/3] SUNRPC: We only need to call svc_delete_xprt() once Trond Myklebust
2008-12-01 0:20 ` [PATCH 3/3] SUNRPC: svc_xprt_enqueue should not refuse to enqueue 'XPT_DEAD' transports Trond Myklebust
2008-12-17 15:35 ` Tom Tucker
2008-12-17 19:07 ` Trond Myklebust
2008-12-23 14:49 ` Tom Tucker
2008-12-23 23:39 ` Tom Tucker
2008-12-01 0:29 ` [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds" Trond Myklebust
2008-12-02 15:22 ` Kasparek Tomas
2008-12-02 15:37 ` Trond Myklebust
2008-12-02 16:26 ` Kasparek Tomas
2008-12-02 18:10 ` Trond Myklebust
2008-12-01 22:09 ` Ian Campbell
2008-12-06 12:16 ` Ian Campbell
2008-12-14 18:24 ` Ian Campbell
2008-12-16 17:55 ` J. Bruce Fields
2008-12-16 18:39 ` Ian Campbell
2009-01-07 22:21 ` J. Bruce Fields
2009-01-08 18:20 ` J. Bruce Fields
2009-01-08 21:22 ` Ian Campbell
2009-01-08 21:26 ` J. Bruce Fields
2009-01-12 9:46 ` Ian Campbell
2009-01-22 8:27 ` Ian Campbell
2009-01-22 16:44 ` J. Bruce Fields
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).